Icon

kn_​automl_​h2o_​regression_​r_​vtreat

H2O.ai AutoML (wrapped with R) with vtreat data preparation in KNIME for regression problems

H2O.ai AutoML (wrapped with R) with vtreat data preparation in KNIME for regression problems (with R vtreat) - a powerful auto-machine-learning framework (https://hub.knime.com/mlauber71/spaces/Public/latest/automl/)

For details please refer to these entries:
https://forum.knime.com/t/h2o-ai-automl-in-knime-for-regression-problems/20924

This is a modified version that also offers R package vtreat to prepare data and store the preparation and also uses a split of training and test (70/30) while splitting the remaining 70% again (80/20) to get more stable results.



# Run AutoML for 60 seconds or# 300 = 5 min, 600 = 10 min, 900 = 15 min, 1800 = 30 min, 3600 = 1 hour, # 7200 = 2 hours# 14400 = 4 hours# 16200 = 4.5 hours# 18000 = 5 Stunden# 21600 = 6 hours# 25200 = 7 hours# 28800 = 8 hours# 36000 = 10 hours H2O.ai AutoML (wrapped with R) in KNIME for regression problems - a powerful auto-machine-learning framework (https://hub.knime.com/mlauber71/spaces/Public/latest/automl/)v 1.25For details please refer to these entries:https://forum.knime.com/t/h2o-ai-automl-in-knime-for-regression-problems/20924This is a modified version that also offers R package vtreat to prepare data and store the preparation and also uses a split of training and test (70/30) while splitting the remaining 70% again (80/20) to get more stable results # knime.out <- knime.inlibrary("vtreat")v_numeric_vars <- head(knime.in[sapply(knime.in,is.numeric)])v_numeric_names<- colnames(v_numeric_vars)dropList <- c(v_numeric_names)v_categorical_vars <- head(knime.in[, !colnames(knime.in) %in% dropList])v_categorical_names<- colnames(v_categorical_vars)treatmentsN <- designTreatmentsN(knime.in,colnames(knime.in),'Target')treatmentsN_table <- as.data.frame(treatmentsN$scoreFrame[,c('origName', 'varName', 'code', 'rsq', 'sig','extraModelDegrees')])# On significance you might have to tune the significance level# https://winvector.github.io/vtreat/articles/vtreatSignificance.htmlknime.out <- prepare(treatmentsN,knime.in,pruneSig=knime.flow.in[["v_vtreat_prune_sig"]],scale=TRUE)# define output path for vtreat 'model' and statisticspath_rds <- knime.flow.in[["v_path_vtreat_rds"]]path_treatment_table <- knime.flow.in[["v_path_vtreat_table"]]# Save a single object to a filesaveRDS(treatmentsN, c(path_rds))write.table(treatmentsN_table , file = path_treatment_table, sep = "\t", col.names = TRUE) library("vtreat")path_rds <- knime.flow.in[["v_path_vtreat_rds"]]# Restore it under a different nametreatmentsN_apply <- readRDS(path_rds)knime.out <- prepare(treatmentsN_apply,knime.in,pruneSig=knime.flow.in[["v_vtreat_prune_sig"]],scale=TRUE) Inspect the models so far and see to results. This will also give you a quick idea where you stand and what you would be able to achieve.Along with all parameters to load the respective model. current day&timeyyyyMMdd=> choose the time formattrain.tabletest.tableedit: v_runtime_automlset the maximum runtime ofH2O.ai AutoMLin secondstrainvtreat calculatevtreat_treatment_yyyyMMdd_hhmm.rdstestvtreat applyh2o_list_of_models.csvRegression with vtreat and AutoMLv_path_vtreat_tablev_vtreat_prune_sig=> edit!v_path_vtreat_rdspure Regression with AutoMLcreate initial Test andTraining dataKaggle House PricesRead the MOJOmodelRMSE ASCRead Variableimportancekeep best model Create Date&TimeRange Date&Time to String Table Reader Table Reader Integer Input R Snippet R Snippet collect meta data Merge Variables CSV Reader Table Rowto Variable H2O.ai AutoML- Regression Java EditVariable (simple) Java EditVariable (simple) Java EditVariable (simple) H2O.ai AutoML- Regression Table Rowto Variable Test Training H2O MOJO Reader String to Path(Variable) Sorter Column Filter CSV Reader Row Filter # Run AutoML for 60 seconds or# 300 = 5 min, 600 = 10 min, 900 = 15 min, 1800 = 30 min, 3600 = 1 hour, # 7200 = 2 hours# 14400 = 4 hours# 16200 = 4.5 hours# 18000 = 5 Stunden# 21600 = 6 hours# 25200 = 7 hours# 28800 = 8 hours# 36000 = 10 hours H2O.ai AutoML (wrapped with R) in KNIME for regression problems - a powerful auto-machine-learning framework (https://hub.knime.com/mlauber71/spaces/Public/latest/automl/)v 1.25For details please refer to these entries:https://forum.knime.com/t/h2o-ai-automl-in-knime-for-regression-problems/20924This is a modified version that also offers R package vtreat to prepare data and store the preparation and also uses a split of training and test (70/30) while splitting the remaining 70% again (80/20) to get more stable results # knime.out <- knime.inlibrary("vtreat")v_numeric_vars <- head(knime.in[sapply(knime.in,is.numeric)])v_numeric_names<- colnames(v_numeric_vars)dropList <- c(v_numeric_names)v_categorical_vars <- head(knime.in[, !colnames(knime.in) %in% dropList])v_categorical_names<- colnames(v_categorical_vars)treatmentsN <- designTreatmentsN(knime.in,colnames(knime.in),'Target')treatmentsN_table <- as.data.frame(treatmentsN$scoreFrame[,c('origName', 'varName', 'code', 'rsq', 'sig','extraModelDegrees')])# On significance you might have to tune the significance level# https://winvector.github.io/vtreat/articles/vtreatSignificance.htmlknime.out <- prepare(treatmentsN,knime.in,pruneSig=knime.flow.in[["v_vtreat_prune_sig"]],scale=TRUE)# define output path for vtreat 'model' and statisticspath_rds <- knime.flow.in[["v_path_vtreat_rds"]]path_treatment_table <- knime.flow.in[["v_path_vtreat_table"]]# Save a single object to a filesaveRDS(treatmentsN, c(path_rds))write.table(treatmentsN_table , file = path_treatment_table, sep = "\t", col.names = TRUE) library("vtreat")path_rds <- knime.flow.in[["v_path_vtreat_rds"]]# Restore it under a different nametreatmentsN_apply <- readRDS(path_rds)knime.out <- prepare(treatmentsN_apply,knime.in,pruneSig=knime.flow.in[["v_vtreat_prune_sig"]],scale=TRUE) Inspect the models so far and see to results. This will also give you a quick idea where you stand and what you would be able to achieve.Along with all parameters to load the respective model. current day&timeyyyyMMdd=> choose the time formattrain.tabletest.tableedit: v_runtime_automlset the maximum runtime ofH2O.ai AutoMLin secondstrainvtreat calculatevtreat_treatment_yyyyMMdd_hhmm.rdstestvtreat applyh2o_list_of_models.csvRegression with vtreat and AutoMLv_path_vtreat_tablev_vtreat_prune_sig=> edit!v_path_vtreat_rdspure Regression with AutoMLcreate initial Test andTraining dataKaggle House PricesRead the MOJOmodelRMSE ASCRead Variableimportancekeep best modelCreate Date&TimeRange Date&Time to String Table Reader Table Reader Integer Input R Snippet R Snippet collect meta data Merge Variables CSV Reader Table Rowto Variable H2O.ai AutoML- Regression Java EditVariable (simple) Java EditVariable (simple) Java EditVariable (simple) H2O.ai AutoML- Regression Table Rowto Variable Test Training H2O MOJO Reader String to Path(Variable) Sorter Column Filter CSV Reader Row Filter

Nodes

Extensions

Links