Icon

kn_​automl_​h2o_​regression

H2O.ai AutoML (generic KNIME nodes) in KNIME for regression problems - a powerful auto-machine-learning framework

H2O.ai AutoML (generic KNIME nodes) in KNIME for regression problems - a powerful auto-machine-learning framework (https://hub.knime.com/mlauber71/spaces/Public/latest/automl/)
v 1.30

It features various models like Random Forest along with Deep Learning. The results will be written to a folder and the models will be stored in MOJO format to be used in KNIME (as well as on a Big Data cluster via Sparkling Water). One major parameter to set is the running time the model has to test various models and do some hyper parameter optimization as well. The best model of each round is stored and some graphics are produced to see the results.

To run the validations in this workflow you have to install R with several packages. Please refer to the green box on the right.

The results may be used also on Big Data clusters with the help of H2O.ai Sparkling Water (https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_h2o_sparkling_water)

# Run AutoML for 60 seconds or# 300 = 5 min, 600 = 10 min, 900 = 15 min, 1800 = 30 min, 3600 = 1 hour, # 7200 = 2 hours# 14400 = 4 hours# 16200 = 4.5 hours# 18000 = 5 Stunden# 21600 = 6 hours# 25200 = 7 hours# 28800 = 8 hours# 36000 = 10 hours H2O.ai AutoML (generic KNIME nodes) in KNIME for regression problems - a powerful auto-machine-learning framework (https://hub.knime.com/mlauber71/spaces/Public/latest/automl/)v 1.75It features various models like Random Forest along with Deep Learning. The results will be written to a folder and the models will be stored in MOJO format to be used in KNIME (as well as on a Big Data cluster via Sparkling Water). One major parameter to set is the running time the model has to test various models and do some hyper parameter optimization as well. The best model of each round is stored and some graphics are produced to see the results.To run the validations in this workflow you have to install R with several packages. Please refer to the green box on the right.The results may be used also on Big Data clusters with the help of H2O.ai Sparkling Water (https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_h2o_sparkling_water) Subfolders to check/data/ contains the original data/model/contains the stored models in MOJO and H2O format/model/validate/contains the validations and graphics/script/a PDF with further informations about the methods usedH2O.ai AutoML in KNIME for classification problems.pdf # make sure you have R and the necessary R packages installed, also check aout the pdf in /script/https://hub.knime.com/mlauber71/spaces/Public/latest/_r_installation_on_knime_collection~tj5tS_6gYvqOSPlk# Install R alongside KNIME on Windows and MacOS# https://forum.knime.com/t/install-r-alongside-knime-on-windows-and-macos/13287# R and Rtools# https://forum.knime.com/t/how-to-import-tables-from-docx-documents-via-r-snippet/19284/10# RServe 1.8.6+ on MacOSX# https://forum.knime.com/t/installing-rserve-1-8-6-on-macos-10-15-catalina/20909/6?u=mlauber71# if you wish to use the 'pure' R code and import the data with parquetlibrary(arrow) additional R packages needed:ggplot2, lift, reshape2http://docs.h2o.ai/h2o/latest-stable/h2o-docs/downloading.html Inspect the models so far and see to results. This will also give you a quick idea where you stand and what you would be able to achieve.Along with all parameters to load the respective model. which output is there to be interpretedmodel are stored in the folder /model/<full model name>/<model name>.zip-> as MOJO model format (certain model types cannot be stored and reused - so they are excluded as of now)/model/<full model name>/<model name>-> genuine H2O model stored in a folder (can be reused from H2O itself - also could store the Stacked and Ensemble models as well as XGBoost)/model/validate/h2o_list_of_models.csv -> list of all leading model from the runs with their RMSE (among other things) --- individual model results/model/validate/model_table_H2O_AutoML_Regression_yyyymmdd_hhmmh.table-> a KNIME table with a collection of parameters and information about the modelH2O_AutoML_Regression_yyyymmdd_hhmmh....-> CSVfiles containing important information among these: - _leaderboard = the list of all tested models in the runH2O_AutoML_Regression_yyyymmdd_hhmmh.xlsx-> an Excel file containing important information among these: - model_eval = a check split up into several numeric bins to see if the model does perform across them- Bland_Altman = a Bland-Altman Plot (experimental)- all_stat = summary of statistics---- 4 graphics for each model to have visual support when interpreting the results (needs R)(for more details see /script/H2O.ai AutoML in KNIME for regression problems.pdf)model_graph_H2O_AutoML_Regression_yyyymmdd_hhmmh.png-> two lines set next to each other to represent the deviation in a linear formatmodel_graph_H2O_AutoML_Regression_yyyymmdd_hhmmh_hexbin.png-> a Hexbin Plot giving you a compact idea about the position of prediction (submission) and truth (solution) with regards to big blocks (are the large blockpositioned where you would like them)model_graph_H2O_AutoML_Regression_yyyymmdd_hhmmh_parallel_plot.png-> a parallel plot to see if there is a trend with regard to certain individual numbersmodel_graph_H2O_AutoML_Regression_yyyymmdd_hhmmh_bias.png-> a Bland-Altman plot regression modelswith R. Results:/model/validation/Propagate R environmentfor KNIME on MacOS withMiniforge / Minicondaconfigure how to handle the environmentdefault = just check the namesvar_model_pathtrain.tablevar_model_name_full^(.*submission|solution).*$solution to doubleedit: v_runtime_automlset the maximum runtime ofH2O.ai AutoML in SECONDSexclude pathsh2o_list_of_models.csvappend if CSV already exists to collect allmodel runsRead the MOJOmodelh2o_list_of_models.csvkeep best modelRead the MOJOmodelwrite the mojo modelFirst row (best model)var_model_pathvar_model_name_fullvar_leaderboard_pathvar_leaderboard_pathLeaderboardvar_leaderboard_pathv_model_pathtest.tableScore the test tableyou might also use a third table to validatethat has not been used developing themodelR_2Propagate R environmentfor KNIME on Windows withMinicondaconfigure how to handle the environmentdefault = just check the namescreate initial Test andTraining dataKaggle House Prices: Advanced Regression TechniquesModel QualityNumeric - Graphics knime_r_environment Java EditVariable (simple) Numeric Scorer Table Reader Transpose ConstantValue Column Column Rename Column Filter Math Formula Integer Input collect meta data Merge Variables RowID Column Filter CSV Writer Column Resorter H2O MOJO Reader CSV Reader String to Path(Variable) Sorter Row Filter Column Filter Table Rowto Variable H2O MOJO Reader H2O Local Context Table to H2O H2O Model to MOJO H2O MOJO Writer Row Filter Table Rowto Variable String to Path(Variable) String Manipulation Joiner Java EditVariable (simple) String to Path(Variable) CSV Writer ConstantValue Column Merge Variables ConstantValue Column Table Reader H2O AutoML Learner(Regression) H2O MOJO Predictor(Regression) Column Rename knime_r_environment_windows Test Training # Run AutoML for 60 seconds or# 300 = 5 min, 600 = 10 min, 900 = 15 min, 1800 = 30 min, 3600 = 1 hour, # 7200 = 2 hours# 14400 = 4 hours# 16200 = 4.5 hours# 18000 = 5 Stunden# 21600 = 6 hours# 25200 = 7 hours# 28800 = 8 hours# 36000 = 10 hours H2O.ai AutoML (generic KNIME nodes) in KNIME for regression problems - a powerful auto-machine-learning framework (https://hub.knime.com/mlauber71/spaces/Public/latest/automl/)v 1.75It features various models like Random Forest along with Deep Learning. The results will be written to a folder and the models will be stored in MOJO format to be used in KNIME (as well as on a Big Data cluster via Sparkling Water). One major parameter to set is the running time the model has to test various models and do some hyper parameter optimization as well. The best model of each round is stored and some graphics are produced to see the results.To run the validations in this workflow you have to install R with several packages. Please refer to the green box on the right.The results may be used also on Big Data clusters with the help of H2O.ai Sparkling Water (https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_h2o_sparkling_water) Subfolders to check/data/ contains the original data/model/contains the stored models in MOJO and H2O format/model/validate/contains the validations and graphics/script/a PDF with further informations about the methods usedH2O.ai AutoML in KNIME for classification problems.pdf # make sure you have R and the necessary R packages installed, also check aout the pdf in /script/https://hub.knime.com/mlauber71/spaces/Public/latest/_r_installation_on_knime_collection~tj5tS_6gYvqOSPlk# Install R alongside KNIME on Windows and MacOS# https://forum.knime.com/t/install-r-alongside-knime-on-windows-and-macos/13287# R and Rtools# https://forum.knime.com/t/how-to-import-tables-from-docx-documents-via-r-snippet/19284/10# RServe 1.8.6+ on MacOSX# https://forum.knime.com/t/installing-rserve-1-8-6-on-macos-10-15-catalina/20909/6?u=mlauber71# if you wish to use the 'pure' R code and import the data with parquetlibrary(arrow) additional R packages needed:ggplot2, lift, reshape2http://docs.h2o.ai/h2o/latest-stable/h2o-docs/downloading.html Inspect the models so far and see to results. This will also give you a quick idea where you stand and what you would be able to achieve.Along with all parameters to load the respective model. which output is there to be interpretedmodel are stored in the folder /model/<full model name>/<model name>.zip-> as MOJO model format (certain model types cannot be stored and reused - so they are excluded as of now)/model/<full model name>/<model name>-> genuine H2O model stored in a folder (can be reused from H2O itself - also could store the Stacked and Ensemble models as well as XGBoost)/model/validate/h2o_list_of_models.csv -> list of all leading model from the runs with their RMSE (among other things) --- individual model results/model/validate/model_table_H2O_AutoML_Regression_yyyymmdd_hhmmh.table-> a KNIME table with a collection of parameters and information about the modelH2O_AutoML_Regression_yyyymmdd_hhmmh....-> CSVfiles containing important information among these: - _leaderboard = the list of all tested models in the runH2O_AutoML_Regression_yyyymmdd_hhmmh.xlsx-> an Excel file containing important information among these: - model_eval = a check split up into several numeric bins to see if the model does perform across them- Bland_Altman = a Bland-Altman Plot (experimental)- all_stat = summary of statistics---- 4 graphics for each model to have visual support when interpreting the results (needs R)(for more details see /script/H2O.ai AutoML in KNIME for regression problems.pdf)model_graph_H2O_AutoML_Regression_yyyymmdd_hhmmh.png-> two lines set next to each other to represent the deviation in a linear formatmodel_graph_H2O_AutoML_Regression_yyyymmdd_hhmmh_hexbin.png-> a Hexbin Plot giving you a compact idea about the position of prediction (submission) and truth (solution) with regards to big blocks (are the large blockpositioned where you would like them)model_graph_H2O_AutoML_Regression_yyyymmdd_hhmmh_parallel_plot.png-> a parallel plot to see if there is a trend with regard to certain individual numbersmodel_graph_H2O_AutoML_Regression_yyyymmdd_hhmmh_bias.png-> a Bland-Altman plot regression modelswith R. Results:/model/validation/Propagate R environmentfor KNIME on MacOS withMiniforge / Minicondaconfigure how to handle the environmentdefault = just check the namesvar_model_pathtrain.tablevar_model_name_full^(.*submission|solution).*$solution to doubleedit: v_runtime_automlset the maximum runtime ofH2O.ai AutoML in SECONDSexclude pathsh2o_list_of_models.csvappend if CSV already exists to collect allmodel runsRead the MOJOmodelh2o_list_of_models.csvkeep best modelRead the MOJOmodelwrite the mojo modelFirst row (best model)var_model_pathvar_model_name_fullvar_leaderboard_pathvar_leaderboard_pathLeaderboardvar_leaderboard_pathv_model_pathtest.tableScore the test tableyou might also use a third table to validatethat has not been used developing themodelR_2Propagate R environmentfor KNIME on Windows withMinicondaconfigure how to handle the environmentdefault = just check the namescreate initial Test andTraining dataKaggle House Prices: Advanced Regression TechniquesModel QualityNumeric - Graphics knime_r_environment Java EditVariable (simple) Numeric Scorer Table Reader Transpose ConstantValue Column Column Rename Column Filter Math Formula Integer Input collect meta data Merge Variables RowID Column Filter CSV Writer Column Resorter H2O MOJO Reader CSV Reader String to Path(Variable) Sorter Row Filter Column Filter Table Rowto Variable H2O MOJO Reader H2O Local Context Table to H2O H2O Model to MOJO H2O MOJO Writer Row Filter Table Rowto Variable String to Path(Variable) String Manipulation Joiner Java EditVariable (simple) String to Path(Variable) CSV Writer ConstantValue Column Merge Variables ConstantValue Column Table Reader H2O AutoML Learner(Regression) H2O MOJO Predictor(Regression) Column Rename knime_r_environment_windows Test Training

Nodes

Extensions

Links