Icon

kn_​automl_​h2o_​classification_​python_​vtreat

H2O.ai AutoML (wrapped with Python) with vtreat data preparation in KNIME for classification problems (with R vtreat)

H2O.ai AutoML (wrapped with Python) with vtreat data preparation in KNIME for classification problems (with R vtreat) - a powerful auto-machine-learning framework (https://hub.knime.com/mlauber71/spaces/Public/latest/automl/)

For details please refer to these entries:
https://forum.knime.com/t/h2o-ai-automl-in-knime-for-classification-problems/20923

This is a modified version that also offers R package vtreat to prepare data and store the preparation and also uses a split of training and test (70/30) while splitting the remaining 70% again (80/20) to get more stable results






# Run AutoML for 60 seconds or# 300 = 5 min, 600 = 10 min, 900 = 15 min, 1800 = 30 min, 3600 = 1 hour, # 7200 = 2 hours# 14400 = 4 hours# 16200 = 4.5 hours# 18000 = 5 Stunden# 21600 = 6 hours# 25200 = 7 hours# 28800 = 8 hours# 36000 = 10 hours H2O.ai AutoML (wrapped with Python) in KNIME for classification problems (with Python vtreat) - a powerful auto-machine-learning framework (https://hub.knime.com/mlauber71/spaces/Public/latest/automl/)v 2.00For details please refer to these entries:https://forum.knime.com/t/h2o-ai-automl-in-knime-for-classification-problems/20923This is a modified version that also offers Python package "vtreat" to prepare data and store the preparation and also uses a split of training and test (70/30) while splitting the remaining 70% again (80/20) to get more stable resultshttps://github.com/WinVector/pyvtreat/blob/main/Examples/Classification/Classification.md import knime.scripting.io as knioimport vtreat# These are the node's outputs that need to be populated:input_table_1 = knio.input_tables[0].to_pandas()# vtreat for KNIME!# https://win-vector.com/2020/06/28/vtreat-for-knime/vtreat_transform = vtreat.BinomialOutcomeTreatment( outcome_name='Target', # outcome variable outcome_target="1", # outcome of interest cols_to_copy=['Target'], # columns to "carry along" but not treat as input variables params = vtreat.vtreat_parameters({'filter_to_recommended': True,'indicator_min_fraction': knio.flow_variables['v_vtreat_indicator_min_fraction'] })) d_prepared = vtreat_transform.fit_transform(input_table_1, input_table_1['Target'])# svae the transformation rulesvtreat_transform_as_data = vtreat_transform.description_matrix()knio.output_tables[0] = knio.Table.from_pandas(d_prepared)knio.output_tables[1] = knio.Table.from_pandas(vtreat_transform_as_data)knio.output_objects[0] = vtreat_transform import knime.scripting.io as knio# This example script applies a linear regression to the first numeric column of the input table. For this purpose, the# input object is assumed to be compatible with the first return value of numpy.linalg.lstsq (one-dimensional case).import pandas as pdimport vtreatvtreat_model_load = knio.input_objects[0]input_table = knio.input_tables[0].to_pandas()output_table = vtreat_model_load.transform(input_table)knio.output_tables[0] = knio.Table.from_pandas(output_table) Inspect the models so far and see to results. This will also give you a quick idea where you stand and what you would be able to achieve.Along with all parameters to load the respective model. create a data preparation with vtreat package on the training data, store the procedure and apply it to the test data Python Conda environment propagation. Please read this article for more details:KNIME and Python — Setting up and managing Conda environmentshttps://medium.com/p/2ac217792539 under Apple silicon currently not all R packages would work with thispropagation Propagate R environmentfor KNIME on MacOS withMinicondaconfigure how to handle the environmentdefault = just check the namesedit: v_runtime_automlset the maximum runtime ofH2O.ai AutoML in SECONDStrain.tableh2o_list_of_models.csvClassification with vtreat and AutoMLpure Classification with AutoMLRead VariableimportanceAUC DESCkeep best modelRead the MOJOmodelcreate initial Test andTraining dataCensus incomeclassificationtest.tablePropagate R environmentfor KNIME on Windows withMiniforge / Minicondaconfigure how to handle the environmentdefault = just check the names" Target" as the binary targetTRAIN vtreatvtreat for KNIME!https://win-vector.com/2020/06/28/vtreat-for-knime/"Target"as the binary targetTEST vtreatvtreat_treatment_yyyyMMdd_hhmm.zipvtreat_treatment_yyyyMMdd_hhmm.zipvtreat_treatment_<..>.csvPropagate R environmentfor KNIME on MacOS with(Apple Scilicon)Miniforge / Minicondaconfigure how to handle the environmentdefault = just check the namesfor Apple silicon you might have to manuallyinstall "RServe" currentlyPropagate Python environmentfor KNIME on MacOSX (Apple Scilicon)OR Windowswith Miniforge / Minicondaconfigure how to handle the environmentdefault = just check the names knime_r_environment Integer Input(legacy) Table Reader CSV Reader Table Rowto Variable H2O.ai AutoML -Classification H2O.ai AutoML -Classification CSV Reader String to Path(Variable) Sorter Row Filter Column Filter H2O MOJO Reader Test Training Table Reader Meta Information knime_r_environment_windows Python Script Python Script Model Reader Model Writer CSV Writer knime_r_environment_apple_silicon conda_environment_kaggle_knime4 # Run AutoML for 60 seconds or# 300 = 5 min, 600 = 10 min, 900 = 15 min, 1800 = 30 min, 3600 = 1 hour, # 7200 = 2 hours# 14400 = 4 hours# 16200 = 4.5 hours# 18000 = 5 Stunden# 21600 = 6 hours# 25200 = 7 hours# 28800 = 8 hours# 36000 = 10 hours H2O.ai AutoML (wrapped with Python) in KNIME for classification problems (with Python vtreat) - a powerful auto-machine-learning framework (https://hub.knime.com/mlauber71/spaces/Public/latest/automl/)v 2.00For details please refer to these entries:https://forum.knime.com/t/h2o-ai-automl-in-knime-for-classification-problems/20923This is a modified version that also offers Python package "vtreat" to prepare data and store the preparation and also uses a split of training and test (70/30) while splitting the remaining 70% again (80/20) to get more stable resultshttps://github.com/WinVector/pyvtreat/blob/main/Examples/Classification/Classification.md import knime.scripting.io as knioimport vtreat# These are the node's outputs that need to be populated:input_table_1 = knio.input_tables[0].to_pandas()# vtreat for KNIME!# https://win-vector.com/2020/06/28/vtreat-for-knime/vtreat_transform = vtreat.BinomialOutcomeTreatment( outcome_name='Target', # outcome variable outcome_target="1", # outcome of interest cols_to_copy=['Target'], # columns to "carry along" but not treat as input variables params = vtreat.vtreat_parameters({'filter_to_recommended': True,'indicator_min_fraction': knio.flow_variables['v_vtreat_indicator_min_fraction'] })) d_prepared = vtreat_transform.fit_transform(input_table_1, input_table_1['Target'])# svae the transformation rulesvtreat_transform_as_data = vtreat_transform.description_matrix()knio.output_tables[0] = knio.Table.from_pandas(d_prepared)knio.output_tables[1] = knio.Table.from_pandas(vtreat_transform_as_data)knio.output_objects[0] = vtreat_transform import knime.scripting.io as knio# This example script applies a linear regression to the first numeric column of the input table. For this purpose, the# input object is assumed to be compatible with the first return value of numpy.linalg.lstsq (one-dimensional case).import pandas as pdimport vtreatvtreat_model_load = knio.input_objects[0]input_table = knio.input_tables[0].to_pandas()output_table = vtreat_model_load.transform(input_table)knio.output_tables[0] = knio.Table.from_pandas(output_table) Inspect the models so far and see to results. This will also give you a quick idea where you stand and what you would be able to achieve.Along with all parameters to load the respective model. create a data preparation with vtreat package on the training data, store the procedure and apply it to the test data Python Conda environment propagation. Please read this article for more details:KNIME and Python — Setting up and managing Conda environmentshttps://medium.com/p/2ac217792539 under Apple silicon currently not all R packages would work with thispropagation Propagate R environmentfor KNIME on MacOS withMinicondaconfigure how to handle the environmentdefault = just check the namesedit: v_runtime_automlset the maximum runtime ofH2O.ai AutoML in SECONDStrain.tableh2o_list_of_models.csvClassification with vtreat and AutoMLpure Classification with AutoMLRead VariableimportanceAUC DESCkeep best modelRead the MOJOmodelcreate initial Test andTraining dataCensus incomeclassificationtest.tablePropagate R environmentfor KNIME on Windows withMiniforge / Minicondaconfigure how to handle the environmentdefault = just check the names" Target" as the binary targetTRAIN vtreatvtreat for KNIME!https://win-vector.com/2020/06/28/vtreat-for-knime/"Target"as the binary targetTEST vtreatvtreat_treatment_yyyyMMdd_hhmm.zipvtreat_treatment_yyyyMMdd_hhmm.zipvtreat_treatment_<..>.csvPropagate R environmentfor KNIME on MacOS with(Apple Scilicon)Miniforge / Minicondaconfigure how to handle the environmentdefault = just check the namesfor Apple silicon you might have to manuallyinstall "RServe" currentlyPropagate Python environmentfor KNIME on MacOSX (Apple Scilicon)OR Windowswith Miniforge / Minicondaconfigure how to handle the environmentdefault = just check the namesknime_r_environment Integer Input(legacy) Table Reader CSV Reader Table Rowto Variable H2O.ai AutoML -Classification H2O.ai AutoML -Classification CSV Reader String to Path(Variable) Sorter Row Filter Column Filter H2O MOJO Reader Test Training Table Reader Meta Information knime_r_environment_windows Python Script Python Script Model Reader Model Writer CSV Writer knime_r_environment_apple_silicon conda_environment_kaggle_knime4

Nodes

Extensions

Links