Icon

kn_​forum_​automl_​h2o_​classification_​r_​vtreat_​svm

(forum example) H2O.ai AutoML (wrapped with R) with vtreat data preparation in KNIME for classification problems (with R vtreat)

H2O.ai AutoML (wrapped with R) with vtreat data preparation in KNIME for classification problems (with R vtreat) - a powerful auto-machine-learning framework (https://hub.knime.com/mlauber71/spaces/Public/latest/automl/)

For details please refer to these entries:
https://forum.knime.com/t/h2o-ai-automl-in-knime-for-classification-problems/20923

This is a modified version that also offers R package vtreat to prepare data and store the preparation and also uses a split of training and test (70/30) while splitting the remaining 70% again (80/20) to get more stable results






# Run AutoML for 60 seconds or# 300 = 5 min, 600 = 10 min, 900 = 15 min, 1800 = 30 min, 3600 = 1 hour, # 7200 = 2 hours# 14400 = 4 hours# 16200 = 4.5 hours# 18000 = 5 Stunden# 21600 = 6 hours# 25200 = 7 hours# 28800 = 8 hours# 36000 = 10 hours H2O.ai AutoML (wrapped with R) in KNIME for classification problems (with R vtreat) - a powerful auto-machine-learning framework (https://hub.knime.com/mlauber71/spaces/Public/latest/automl/)v 1.05For details please refer to these entries:https://forum.knime.com/t/h2o-ai-automl-in-knime-for-classification-problems/20923This is a modified version that also offers R package vtreat to prepare data and store the preparation and also uses a split of training and test (70/30) while splitting the remaining 70% again (80/20) to get more stable resultsCf. this forum entry:https://forum.knime.com/t/svm-with-poor-classification/25617/5?u=mlauber71 Inspect the models so far and see to results. This will also give you a quick idea where you stand and what youwould be able to achieve.Along with all parameters to load the respective model. # knime.out <- knime.inlibrary("vtreat")v_numeric_vars <- head(knime.in[sapply(knime.in,is.numeric)])v_numeric_names<- colnames(v_numeric_vars)dropList <- c(v_numeric_names)v_categorical_vars <- head(knime.in[, !colnames(knime.in) %in% dropList])v_categorical_names<- colnames(v_categorical_vars)treatmentsC <- designTreatmentsC(knime.in,colnames(knime.in),'Target','1')treatmentsC_table <- as.data.frame(treatmentsC$scoreFrame[,c('origName', 'varName', 'code', 'rsq', 'sig','extraModelDegrees')])# On significance you might have to tune the significance level# https://winvector.github.io/vtreat/articles/vtreatSignificance.htmlknime.out <- prepare(treatmentsC,knime.in,pruneSig=knime.flow.in[["v_vtreat_prune_sig"]],scale=TRUE)# define output path for vtreat 'model' and statisticspath_rds <- knime.flow.in[["v_path_vtreat_rds"]]path_treatment_table <- knime.flow.in[["v_path_vtreat_table"]]# Save a single object to a filesaveRDS(treatmentsC, c(path_rds))write.table(treatmentsC_table , file = path_treatment_table, sep = "\t", col.names = TRUE) library("vtreat")path_rds <- knime.flow.in[["v_path_vtreat_rds"]]# Restore it under a different nametreatmentsC_apply <- readRDS(path_rds)knime.out <- prepare(treatmentsC_apply,knime.in,pruneSig=knime.flow.in[["v_vtreat_prune_sig"]],scale=TRUE) current day&timeyyyyMMdd_HHmm=> choose the time formattrain.tabletest.tableedit: v_runtime_automlset the maximum runtime ofH2O.ai AutoMLh2o_list_of_models.csvAUC DESCtrainvtreat calculatevtreat_treatment_yyyyMMdd_hhmm.rdstestvtreat applyv_path_vtreat_tablekeep best modelRead the MOJOmodelRead VariableimportanceClassification with vtreat and AutoMLcreate initial Test andTraining dataCensus incomeclassificationv_vtreat_prune_sig=> edit!v_path_vtreat_rdspure Classification with AutoML Create Date&TimeRange Date&Time to String Table Rowto Variable Table Reader Table Reader Integer Input Merge Variables(deprecated) CSV Reader Sorter R Snippet R Snippet Java EditVariable (simple) Row Filter Table Rowto Variable H2O MOJO Reader CSV Reader collect meta data H2O.ai AutoML -Classification Test Training Java EditVariable (simple) Java EditVariable (simple) H2O.ai AutoML -Classification # Run AutoML for 60 seconds or# 300 = 5 min, 600 = 10 min, 900 = 15 min, 1800 = 30 min, 3600 = 1 hour, # 7200 = 2 hours# 14400 = 4 hours# 16200 = 4.5 hours# 18000 = 5 Stunden# 21600 = 6 hours# 25200 = 7 hours# 28800 = 8 hours# 36000 = 10 hours H2O.ai AutoML (wrapped with R) in KNIME for classification problems (with R vtreat) - a powerful auto-machine-learning framework (https://hub.knime.com/mlauber71/spaces/Public/latest/automl/)v 1.05For details please refer to these entries:https://forum.knime.com/t/h2o-ai-automl-in-knime-for-classification-problems/20923This is a modified version that also offers R package vtreat to prepare data and store the preparation and also uses a split of training and test (70/30) while splitting the remaining 70% again (80/20) to get more stable resultsCf. this forum entry:https://forum.knime.com/t/svm-with-poor-classification/25617/5?u=mlauber71 Inspect the models so far and see to results. This will also give you a quick idea where you stand and what youwould be able to achieve.Along with all parameters to load the respective model. # knime.out <- knime.inlibrary("vtreat")v_numeric_vars <- head(knime.in[sapply(knime.in,is.numeric)])v_numeric_names<- colnames(v_numeric_vars)dropList <- c(v_numeric_names)v_categorical_vars <- head(knime.in[, !colnames(knime.in) %in% dropList])v_categorical_names<- colnames(v_categorical_vars)treatmentsC <- designTreatmentsC(knime.in,colnames(knime.in),'Target','1')treatmentsC_table <- as.data.frame(treatmentsC$scoreFrame[,c('origName', 'varName', 'code', 'rsq', 'sig','extraModelDegrees')])# On significance you might have to tune the significance level# https://winvector.github.io/vtreat/articles/vtreatSignificance.htmlknime.out <- prepare(treatmentsC,knime.in,pruneSig=knime.flow.in[["v_vtreat_prune_sig"]],scale=TRUE)# define output path for vtreat 'model' and statisticspath_rds <- knime.flow.in[["v_path_vtreat_rds"]]path_treatment_table <- knime.flow.in[["v_path_vtreat_table"]]# Save a single object to a filesaveRDS(treatmentsC, c(path_rds))write.table(treatmentsC_table , file = path_treatment_table, sep = "\t", col.names = TRUE) library("vtreat")path_rds <- knime.flow.in[["v_path_vtreat_rds"]]# Restore it under a different nametreatmentsC_apply <- readRDS(path_rds)knime.out <- prepare(treatmentsC_apply,knime.in,pruneSig=knime.flow.in[["v_vtreat_prune_sig"]],scale=TRUE) current day&timeyyyyMMdd_HHmm=> choose the time formattrain.tabletest.tableedit: v_runtime_automlset the maximum runtime ofH2O.ai AutoMLh2o_list_of_models.csvAUC DESCtrainvtreat calculatevtreat_treatment_yyyyMMdd_hhmm.rdstestvtreat applyv_path_vtreat_tablekeep best modelRead the MOJOmodelRead VariableimportanceClassification with vtreat and AutoMLcreate initial Test andTraining dataCensus incomeclassificationv_vtreat_prune_sig=> edit!v_path_vtreat_rdspure Classification with AutoMLCreate Date&TimeRange Date&Time to String Table Rowto Variable Table Reader Table Reader Integer Input Merge Variables(deprecated) CSV Reader Sorter R Snippet R Snippet Java EditVariable (simple) Row Filter Table Rowto Variable H2O MOJO Reader CSV Reader collect meta data H2O.ai AutoML -Classification Test Training Java EditVariable (simple) Java EditVariable (simple) H2O.ai AutoML -Classification

Nodes

Extensions

Links