Icon

kn_​forum_​38612_​h2o_​ecg_​classification_​r_​vtreat

H2O.ai AutoML (wrapped with R community nodes) in KNIME for classification problems (with R vtreat)

H2O.ai AutoML (wrapped with R community nodes) in KNIME for classification problems (with R vtreat) - a powerful auto-machine-learning framework (https://hub.knime.com/mlauber71/spaces/Public/latest/automl/)
v 1.55 - use R community snippet

For details please refer to these entries:
https://forum.knime.com/t/h2o-ai-automl-in-knime-for-classification-problems/20923

This is a modified version that also offers R package vtreat to prepare data and store the preparation and also uses a split of training and test (70/30) while splitting the remaining 70% again (80/20) to get more stable results

# Run AutoML for 60 seconds or# 300 = 5 min, 600 = 10 min, 900 = 15 min, 1800 = 30 min, 3600 = 1 hour, # 7200 = 2 hours# 14400 = 4 hours# 16200 = 4.5 hours# 18000 = 5 Stunden# 21600 = 6 hours# 25200 = 7 hours# 28800 = 8 hours# 36000 = 10 hours H2O.ai AutoML (wrapped with R community nodes) in KNIME for classification problems (with R vtreat) - a powerful auto-machine-learning framework (https://hub.knime.com/mlauber71/spaces/Public/latest/automl/)v 1.55 - use R community snippetFor details please refer to these entries:https://forum.knime.com/t/h2o-ai-automl-in-knime-for-classification-problems/20923This is a modified version that also offers R package vtreat to prepare data and store the preparation and also uses a split of training and test (70/30) while splitting the remaining 70% again (80/20) to get more stable results # knime.out <- knime.inlibrary("vtreat")v_numeric_vars <- head(knime.in[sapply(knime.in,is.numeric)])v_numeric_names<- colnames(v_numeric_vars)dropList <- c(v_numeric_names)v_categorical_vars <- head(knime.in[, !colnames(knime.in) %in% dropList])v_categorical_names<- colnames(v_categorical_vars)treatmentsC <- designTreatmentsC(knime.in,colnames(knime.in),'Target','1')treatmentsC_table <- as.data.frame(treatmentsC$scoreFrame[,c('origName', 'varName', 'code', 'rsq', 'sig', 'extraModelDegrees')])# On significance you might have to tune the significance level# https://winvector.github.io/vtreat/articles/vtreatSignificance.htmlknime.out <- prepare(treatmentsC,knime.in,pruneSig=knime.flow.in[["v_vtreat_prune_sig"]],scale=TRUE)# define output path for vtreat 'model' and statisticspath_rds <- knime.flow.in[["v_path_vtreat_rds"]]path_treatment_table <- knime.flow.in[["v_path_vtreat_table"]]# Save a single object to a filesaveRDS(treatmentsC, c(path_rds))write.table(treatmentsC_table , file = path_treatment_table, sep = "\t", col.names = TRUE) library("vtreat")path_rds <- knime.flow.in[["v_path_vtreat_rds"]]# Restore it under a different nametreatmentsC_apply <- readRDS(path_rds)knime.out <- prepare(treatmentsC_apply,knime.in,pruneSig=knime.flow.in[["v_vtreat_prune_sig"]],scale=TRUE) Inspect the models so far and see to results. This will also give you a quick idea where you stand and what you would be able to achieve.Along with all parameters to load the respective model. current day&timeyyyyMMdd_HHmm=> choose the time formatedit: v_runtime_automlset the maximum runtime ofH2O.ai AutoML in SECONDStrainvtreat calculatevtreat_treatment_yyyyMMdd_hhmm.rdstestvtreat applyv_path_vtreat_tableh2o_list_of_models.csvClassification with vtreat and AutoMLusing the R community nodesRead Variableimportancev_vtreat_prune_sig=> edit!v_path_vtreat_rdspure Classification with AutoMLusing the R community nodesCohen's kappaDESCkeep best modeltrain.tabletest.tableTargetColumn140Column140Column140Column140Column140Column140TargetRead ModelLeaderboardpredict on thetest datav_csv_* Create Date&TimeRange Date&Time to String Integer Input collect meta data R Snippet R Snippet Java EditVariable (simple) Merge Variables CSV Reader Table Rowto Variable H2O.ai AutoML -Classification CSV Reader Java EditVariable (simple) Java EditVariable (simple) H2O.ai AutoML -Classification Table Rowto Variable Sorter Row Filter Column Filter String to Path(Variable) Table Reader Table Reader Column Rename String ToNumber (PMML) Number ToString (PMML) Double To Int String ToNumber (PMML) Double To Int Number ToString (PMML) Column Rename File Reader(Complex Format) Scorer (JavaScript) R Snippet String to Path(Variable) # Run AutoML for 60 seconds or# 300 = 5 min, 600 = 10 min, 900 = 15 min, 1800 = 30 min, 3600 = 1 hour, # 7200 = 2 hours# 14400 = 4 hours# 16200 = 4.5 hours# 18000 = 5 Stunden# 21600 = 6 hours# 25200 = 7 hours# 28800 = 8 hours# 36000 = 10 hours H2O.ai AutoML (wrapped with R community nodes) in KNIME for classification problems (with R vtreat) - a powerful auto-machine-learning framework (https://hub.knime.com/mlauber71/spaces/Public/latest/automl/)v 1.55 - use R community snippetFor details please refer to these entries:https://forum.knime.com/t/h2o-ai-automl-in-knime-for-classification-problems/20923This is a modified version that also offers R package vtreat to prepare data and store the preparation and also uses a split of training and test (70/30) while splitting the remaining 70% again (80/20) to get more stable results # knime.out <- knime.inlibrary("vtreat")v_numeric_vars <- head(knime.in[sapply(knime.in,is.numeric)])v_numeric_names<- colnames(v_numeric_vars)dropList <- c(v_numeric_names)v_categorical_vars <- head(knime.in[, !colnames(knime.in) %in% dropList])v_categorical_names<- colnames(v_categorical_vars)treatmentsC <- designTreatmentsC(knime.in,colnames(knime.in),'Target','1')treatmentsC_table <- as.data.frame(treatmentsC$scoreFrame[,c('origName', 'varName', 'code', 'rsq', 'sig', 'extraModelDegrees')])# On significance you might have to tune the significance level# https://winvector.github.io/vtreat/articles/vtreatSignificance.htmlknime.out <- prepare(treatmentsC,knime.in,pruneSig=knime.flow.in[["v_vtreat_prune_sig"]],scale=TRUE)# define output path for vtreat 'model' and statisticspath_rds <- knime.flow.in[["v_path_vtreat_rds"]]path_treatment_table <- knime.flow.in[["v_path_vtreat_table"]]# Save a single object to a filesaveRDS(treatmentsC, c(path_rds))write.table(treatmentsC_table , file = path_treatment_table, sep = "\t", col.names = TRUE) library("vtreat")path_rds <- knime.flow.in[["v_path_vtreat_rds"]]# Restore it under a different nametreatmentsC_apply <- readRDS(path_rds)knime.out <- prepare(treatmentsC_apply,knime.in,pruneSig=knime.flow.in[["v_vtreat_prune_sig"]],scale=TRUE) Inspect the models so far and see to results. This will also give you a quick idea where you stand and what you would be able to achieve.Along with all parameters to load the respective model. current day&timeyyyyMMdd_HHmm=> choose the time formatedit: v_runtime_automlset the maximum runtime ofH2O.ai AutoML in SECONDStrainvtreat calculatevtreat_treatment_yyyyMMdd_hhmm.rdstestvtreat applyv_path_vtreat_tableh2o_list_of_models.csvClassification with vtreat and AutoMLusing the R community nodesRead Variableimportancev_vtreat_prune_sig=> edit!v_path_vtreat_rdspure Classification with AutoMLusing the R community nodesCohen's kappaDESCkeep best modeltrain.tabletest.tableTargetColumn140Column140Column140Column140Column140Column140TargetRead ModelLeaderboardpredict on thetest datav_csv_*Create Date&TimeRange Date&Time to String Integer Input collect meta data R Snippet R Snippet Java EditVariable (simple) Merge Variables CSV Reader Table Rowto Variable H2O.ai AutoML -Classification CSV Reader Java EditVariable (simple) Java EditVariable (simple) H2O.ai AutoML -Classification Table Rowto Variable Sorter Row Filter Column Filter String to Path(Variable) Table Reader Table Reader Column Rename String ToNumber (PMML) Number ToString (PMML) Double To Int String ToNumber (PMML) Double To Int Number ToString (PMML) Column Rename File Reader(Complex Format) Scorer (JavaScript) R Snippet String to Path(Variable)

Nodes

Extensions

Links