Icon

kn_​automl_​h2o_​regression_​python_​vtreat

H2O.ai AutoML (wrapped with Python) with vtreat data preparation in KNIME for regression problems

H2O.ai AutoML (wrapped with Python) with vtreat data preparation in KNIME for regression problems (with R vtreat) - a powerful auto-machine-learning framework (https://hub.knime.com/mlauber71/spaces/Public/latest/automl/)

For details please refer to these entries:
https://forum.knime.com/t/h2o-ai-automl-in-knime-for-regression-problems/20924

This is a modified version that also offers R package vtreat to prepare data and store the preparation and also uses a split of training and test (70/30) while splitting the remaining 70% again (80/20) to get more stable results.

# Run AutoML for 60 seconds or# 300 = 5 min, 600 = 10 min, 900 = 15 min, 1800 = 30 min, 3600 = 1 hour, # 7200 = 2 hours# 14400 = 4 hours# 16200 = 4.5 hours# 18000 = 5 Stunden# 21600 = 6 hours# 25200 = 7 hours# 28800 = 8 hours# 36000 = 10 hours H2O.ai AutoML (wrapped with Python) in KNIME for regression problems - a powerful auto-machine-learning framework (https://hub.knime.com/mlauber71/spaces/Public/latest/automl/)v 2.00For details please refer to these entries:https://forum.knime.com/t/h2o-ai-automl-in-knime-for-regression-problems/20924This is a modified version that also offers R package vtreat to prepare data and store the preparation and also uses a split of training and test (70/30) while splitting the remaining 70% again (80/20) to get more stable results # knime.out <- knime.inlibrary("vtreat")v_numeric_vars <- head(knime.in[sapply(knime.in,is.numeric)])v_numeric_names<- colnames(v_numeric_vars)dropList <- c(v_numeric_names)v_categorical_vars <- head(knime.in[, !colnames(knime.in) %in% dropList])v_categorical_names<- colnames(v_categorical_vars)treatmentsN <- designTreatmentsN(knime.in,colnames(knime.in),'Target')treatmentsN_table <- as.data.frame(treatmentsN$scoreFrame[,c('origName', 'varName', 'code', 'rsq', 'sig','extraModelDegrees')])# On significance you might have to tune the significance level# https://winvector.github.io/vtreat/articles/vtreatSignificance.htmlknime.out <- prepare(treatmentsN,knime.in,pruneSig=knime.flow.in[["v_vtreat_prune_sig"]],scale=TRUE)# define output path for vtreat 'model' and statisticspath_rds <- knime.flow.in[["v_path_vtreat_rds"]]path_treatment_table <- knime.flow.in[["v_path_vtreat_table"]]# Save a single object to a filesaveRDS(treatmentsN, c(path_rds))write.table(treatmentsN_table , file = path_treatment_table, sep = "\t", col.names = TRUE) library("vtreat")path_rds <- knime.flow.in[["v_path_vtreat_rds"]]# Restore it under a different nametreatmentsN_apply <- readRDS(path_rds)knime.out <- prepare(treatmentsN_apply,knime.in,pruneSig=knime.flow.in[["v_vtreat_prune_sig"]],scale=TRUE) Inspect the models so far and see to results. This will also give you a quick idea where you stand and what you would be able to achieve.Along with all parameters to load the respective model. current day&timeyyyyMMdd=> choose the time formatcreate initial Test andTraining dataKaggle House Prices: Advanced Regression Techniquesedit: v_runtime_automlset the maximum runtime ofH2O.ai AutoMLin secondstrain.tabletrainvtreat calculatevtreat_treatment_yyyyMMdd_hhmm.rdstestvtreat applyh2o_list_of_models.csvRegression with vtreat and AutoMLpure Regression with AutoMLv_path_vtreat_tablev_vtreat_prune_sig=> edit!v_path_vtreat_rdskeep best modelRead the MOJOmodelRMSE ASCRead Variableimportancetest.tablePropagate Python environmentfor KNIME on MacOSX (Apple Scilicon)OR Windowswith Miniforge / Minicondaconfigure how to handle the environmentdefault = just check the names Create Date&TimeRange Date&Time to String Test Training Integer Input(legacy) Table Reader R Snippet R Snippet Merge Variables CSV Reader Table Rowto Variable H2O.ai AutoML- Regression H2O.ai AutoML- Regression Java EditVariable (simple) Java EditVariable (simple) Java EditVariable (simple) Table Rowto Variable Row Filter H2O MOJO Reader String to Path(Variable) Sorter Column Filter CSV Reader collect meta data Table Reader conda_environment_kaggle_knime4 # Run AutoML for 60 seconds or# 300 = 5 min, 600 = 10 min, 900 = 15 min, 1800 = 30 min, 3600 = 1 hour, # 7200 = 2 hours# 14400 = 4 hours# 16200 = 4.5 hours# 18000 = 5 Stunden# 21600 = 6 hours# 25200 = 7 hours# 28800 = 8 hours# 36000 = 10 hours H2O.ai AutoML (wrapped with Python) in KNIME for regression problems - a powerful auto-machine-learning framework (https://hub.knime.com/mlauber71/spaces/Public/latest/automl/)v 2.00For details please refer to these entries:https://forum.knime.com/t/h2o-ai-automl-in-knime-for-regression-problems/20924This is a modified version that also offers R package vtreat to prepare data and store the preparation and also uses a split of training and test (70/30) while splitting the remaining 70% again (80/20) to get more stable results # knime.out <- knime.inlibrary("vtreat")v_numeric_vars <- head(knime.in[sapply(knime.in,is.numeric)])v_numeric_names<- colnames(v_numeric_vars)dropList <- c(v_numeric_names)v_categorical_vars <- head(knime.in[, !colnames(knime.in) %in% dropList])v_categorical_names<- colnames(v_categorical_vars)treatmentsN <- designTreatmentsN(knime.in,colnames(knime.in),'Target')treatmentsN_table <- as.data.frame(treatmentsN$scoreFrame[,c('origName', 'varName', 'code', 'rsq', 'sig','extraModelDegrees')])# On significance you might have to tune the significance level# https://winvector.github.io/vtreat/articles/vtreatSignificance.htmlknime.out <- prepare(treatmentsN,knime.in,pruneSig=knime.flow.in[["v_vtreat_prune_sig"]],scale=TRUE)# define output path for vtreat 'model' and statisticspath_rds <- knime.flow.in[["v_path_vtreat_rds"]]path_treatment_table <- knime.flow.in[["v_path_vtreat_table"]]# Save a single object to a filesaveRDS(treatmentsN, c(path_rds))write.table(treatmentsN_table , file = path_treatment_table, sep = "\t", col.names = TRUE) library("vtreat")path_rds <- knime.flow.in[["v_path_vtreat_rds"]]# Restore it under a different nametreatmentsN_apply <- readRDS(path_rds)knime.out <- prepare(treatmentsN_apply,knime.in,pruneSig=knime.flow.in[["v_vtreat_prune_sig"]],scale=TRUE) Inspect the models so far and see to results. This will also give you a quick idea where you stand and what you would be able to achieve.Along with all parameters to load the respective model. current day&timeyyyyMMdd=> choose the time formatcreate initial Test andTraining dataKaggle House Prices: Advanced Regression Techniquesedit: v_runtime_automlset the maximum runtime ofH2O.ai AutoMLin secondstrain.tabletrainvtreat calculatevtreat_treatment_yyyyMMdd_hhmm.rdstestvtreat applyh2o_list_of_models.csvRegression with vtreat and AutoMLpure Regression with AutoMLv_path_vtreat_tablev_vtreat_prune_sig=> edit!v_path_vtreat_rdskeep best modelRead the MOJOmodelRMSE ASCRead Variableimportancetest.tablePropagate Python environmentfor KNIME on MacOSX (Apple Scilicon)OR Windowswith Miniforge / Minicondaconfigure how to handle the environmentdefault = just check the namesCreate Date&TimeRange Date&Time to String Test Training Integer Input(legacy) Table Reader R Snippet R Snippet Merge Variables CSV Reader Table Rowto Variable H2O.ai AutoML- Regression H2O.ai AutoML- Regression Java EditVariable (simple) Java EditVariable (simple) Java EditVariable (simple) Table Rowto Variable Row Filter H2O MOJO Reader String to Path(Variable) Sorter Column Filter CSV Reader collect meta data Table Reader conda_environment_kaggle_knime4

Nodes

Extensions

Links