Icon

kn_​forum_​39929_​video_​h2o_​regression_​r_​vtreat

H2O.ai AutoML (wrapped with R) with vtreat data preparation in KNIME for regression problems

H2O.ai AutoML (wrapped with R) with vtreat data preparation in KNIME for regression problems (with R vtreat) - a powerful auto-machine-learning framework (https://hub.knime.com/mlauber71/spaces/Public/latest/automl/)

For details please refer to these entries:
https://forum.knime.com/t/h2o-ai-automl-in-knime-for-regression-problems/20924

This is a modified version that also offers R package vtreat to prepare data and store the preparation and also uses a split of training and test (70/30) while splitting the remaining 70% again (80/20) to get more stable results.





# Run AutoML for 60 seconds or# 300 = 5 min, 600 = 10 min, 900 = 15 min, 1800 = 30 min, 3600 = 1 hour, # 7200 = 2 hours# 14400 = 4 hours# 16200 = 4.5 hours# 18000 = 5 Stunden# 21600 = 6 hours# 25200 = 7 hours# 28800 = 8 hours# 36000 = 10 hours H2O.ai AutoML (wrapped with R) in KNIME for regression problems - a powerful auto-machine-learning framework (https://hub.knime.com/mlauber71/spaces/Public/latest/automl/)v 1.25For details please refer to these entries:https://forum.knime.com/t/h2o-ai-automl-in-knime-for-regression-problems/20924This is a modified version that also offers R package vtreat to prepare data and store the preparation and also uses a split of training and test (70/30) while splitting the remaining 70% again (80/20) to get more stable results # knime.out <- knime.inlibrary("vtreat")v_numeric_vars <- head(knime.in[sapply(knime.in,is.numeric)])v_numeric_names<- colnames(v_numeric_vars)dropList <- c(v_numeric_names)v_categorical_vars <- head(knime.in[, !colnames(knime.in) %in% dropList])v_categorical_names<- colnames(v_categorical_vars)treatmentsN <- designTreatmentsN(knime.in,colnames(knime.in),'Target')treatmentsN_table <- as.data.frame(treatmentsN$scoreFrame[,c('origName', 'varName', 'code', 'rsq', 'sig','extraModelDegrees')])# On significance you might have to tune the significance level# https://winvector.github.io/vtreat/articles/vtreatSignificance.htmlknime.out <- prepare(treatmentsN,knime.in,pruneSig=knime.flow.in[["v_vtreat_prune_sig"]],scale=TRUE)# define output path for vtreat 'model' and statisticspath_rds <- knime.flow.in[["v_path_vtreat_rds"]]path_treatment_table <- knime.flow.in[["v_path_vtreat_table"]]# Save a single object to a filesaveRDS(treatmentsN, c(path_rds))write.table(treatmentsN_table , file = path_treatment_table, sep = "\t", col.names = TRUE) library("vtreat")path_rds <- knime.flow.in[["v_path_vtreat_rds"]]# Restore it under a different nametreatmentsN_apply <- readRDS(path_rds)knime.out <- prepare(treatmentsN_apply,knime.in,pruneSig=knime.flow.in[["v_vtreat_prune_sig"]],scale=TRUE) Inspect the models so far and see to results. This will also give you a quick idea where you stand and what you would be able to achieve.Along with all parameters to load the respective model. current day&timeyyyyMMdd=> choose the time formatedit: v_runtime_automlset the maximum runtime ofH2O.ai AutoMLin secondstrainvtreat calculatevtreat_treatment_yyyyMMdd_hhmm.rdstestvtreat applyh2o_list_of_models.csvRegression with vtreat and AutoMLv_path_vtreat_tablev_vtreat_prune_sig=> edit!v_path_vtreat_rdscreate initial Test andTraining data"Video"Read the MOJOmodelRMSE ASCRead Variableimportancekeep best modeltrain.tabletest.tablepure Regression with AutoML Create Date&TimeRange Date&Time to String Integer Input R Snippet R Snippet collect meta data Merge Variables CSV Reader Table Rowto Variable H2O.ai AutoML- Regression Java EditVariable (simple) Java EditVariable (simple) Java EditVariable (simple) Table Rowto Variable Test Training H2O MOJO Reader String to Path(Variable) Sorter Column Filter CSV Reader Row Filter Table Reader Table Reader H2O.ai AutoML- Regression # Run AutoML for 60 seconds or# 300 = 5 min, 600 = 10 min, 900 = 15 min, 1800 = 30 min, 3600 = 1 hour, # 7200 = 2 hours# 14400 = 4 hours# 16200 = 4.5 hours# 18000 = 5 Stunden# 21600 = 6 hours# 25200 = 7 hours# 28800 = 8 hours# 36000 = 10 hours H2O.ai AutoML (wrapped with R) in KNIME for regression problems - a powerful auto-machine-learning framework (https://hub.knime.com/mlauber71/spaces/Public/latest/automl/)v 1.25For details please refer to these entries:https://forum.knime.com/t/h2o-ai-automl-in-knime-for-regression-problems/20924This is a modified version that also offers R package vtreat to prepare data and store the preparation and also uses a split of training and test (70/30) while splitting the remaining 70% again (80/20) to get more stable results # knime.out <- knime.inlibrary("vtreat")v_numeric_vars <- head(knime.in[sapply(knime.in,is.numeric)])v_numeric_names<- colnames(v_numeric_vars)dropList <- c(v_numeric_names)v_categorical_vars <- head(knime.in[, !colnames(knime.in) %in% dropList])v_categorical_names<- colnames(v_categorical_vars)treatmentsN <- designTreatmentsN(knime.in,colnames(knime.in),'Target')treatmentsN_table <- as.data.frame(treatmentsN$scoreFrame[,c('origName', 'varName', 'code', 'rsq', 'sig','extraModelDegrees')])# On significance you might have to tune the significance level# https://winvector.github.io/vtreat/articles/vtreatSignificance.htmlknime.out <- prepare(treatmentsN,knime.in,pruneSig=knime.flow.in[["v_vtreat_prune_sig"]],scale=TRUE)# define output path for vtreat 'model' and statisticspath_rds <- knime.flow.in[["v_path_vtreat_rds"]]path_treatment_table <- knime.flow.in[["v_path_vtreat_table"]]# Save a single object to a filesaveRDS(treatmentsN, c(path_rds))write.table(treatmentsN_table , file = path_treatment_table, sep = "\t", col.names = TRUE) library("vtreat")path_rds <- knime.flow.in[["v_path_vtreat_rds"]]# Restore it under a different nametreatmentsN_apply <- readRDS(path_rds)knime.out <- prepare(treatmentsN_apply,knime.in,pruneSig=knime.flow.in[["v_vtreat_prune_sig"]],scale=TRUE) Inspect the models so far and see to results. This will also give you a quick idea where you stand and what you would be able to achieve.Along with all parameters to load the respective model. current day&timeyyyyMMdd=> choose the time formatedit: v_runtime_automlset the maximum runtime ofH2O.ai AutoMLin secondstrainvtreat calculatevtreat_treatment_yyyyMMdd_hhmm.rdstestvtreat applyh2o_list_of_models.csvRegression with vtreat and AutoMLv_path_vtreat_tablev_vtreat_prune_sig=> edit!v_path_vtreat_rdscreate initial Test andTraining data"Video"Read the MOJOmodelRMSE ASCRead Variableimportancekeep best modeltrain.tabletest.tablepure Regression with AutoMLCreate Date&TimeRange Date&Time to String Integer Input R Snippet R Snippet collect meta data Merge Variables CSV Reader Table Rowto Variable H2O.ai AutoML- Regression Java EditVariable (simple) Java EditVariable (simple) Java EditVariable (simple) Table Rowto Variable Test Training H2O MOJO Reader String to Path(Variable) Sorter Column Filter CSV Reader Row Filter Table Reader Table Reader H2O.ai AutoML- Regression

Nodes

Extensions

Links