Icon

s_​620_​spark_​h2o_​apply

s_620 - Apply H2O.ai model to Big Data with KNIME and Spark

s_620 - this is how your daily production workflow could look like.
You have the stored lists and rules how to prepare the data and you have the H2O.ai model in MOJO format and you keep a customer_number (or whatever ID you need) along with your prediction score so you can deliver the data.

https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_h2o_automl_spark_46

remember to keep thecustomer number (or other ID) s_620 - this is how your daily production workflow could look like.You have the stored lists and rules how to prepare the data and you have the H2O.ai model in MOJO format and you keep a customer_number (or whatever ID you need) along with your prediction score so you can deliver the data.https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_h2o_automl_spark_46 delivery of the results into your big data system, you could configurethat. The data is on the cluster for futher usage of you download itand distribute it via CSV or something. select the model you want to deploy. You can automatically select the best model of your collection or choose one by hand => create a local big data contexif you encouter any problems, closeKNIME and delete all data from the folder/big_data/ and start overapply lable encoding andmanipulations=> keep thecustomer_numberh2o_list_of_models.csvAUC DESCkeep best modelvar_model_name_full^(.*score|customer_number).*$here goes the Spark datathat you have usedat the start of thes_601 workflow to encode the targets- to _customer_numbersimpulates existence of a customer number thatwould be needed to export the relevant data linespredictdefault.scored_datanvl_numeric.../model/spark_label_encode_sql.tableSQL - encode the labels with SQLspark_label_encoder../model/d_regex_spark_include_500.tableRegEx of remaining Columnsregex_include_string../model/nvl_numeric_sql.tableSQL - CAST the numeric variablestrainregex filter columnsapply_modelapply_modelcensus_income_test.parquetREFRESH TABLE #table#=> make sure the Sparkenvironment 'knows' about the tablemodel_base_location.tableRead the MOJOmodel.zipvar_model_name_full=> manually enter the selected modelreturn "StackedEnsemble_AllModels_AutoML_20220615_113230";local big datacontext create Spark SQL Query CSV Reader Sorter Row Filter Column Filter Table Rowto Variable Spark Column Rename Spark Column Filter Hive to Spark Column Rename DB SQL Executor Java Snippet(simple) Spark H2O MOJO Predictor(Classification) Spark to Hive DB Reader Table Rowto Variable Table Reader Table Rowto Variable Table Reader Table Rowto Variable Table Reader Merge Variables Spark Column Filter DB Table Creator DB Loader Parquet Reader Spark SQL Query Create File/FolderVariables Create File/FolderVariables H2O MOJO Reader Create File/FolderVariables Java EditVariable (simple) remember to keep thecustomer number (or other ID) s_620 - this is how your daily production workflow could look like.You have the stored lists and rules how to prepare the data and you have the H2O.ai model in MOJO format and you keep a customer_number (or whatever ID you need) along with your prediction score so you can deliver the data.https://hub.knime.com/mlauber71/spaces/Public/latest/kn_example_bigdata_h2o_automl_spark_46 delivery of the results into your big data system, you could configurethat. The data is on the cluster for futher usage of you download itand distribute it via CSV or something. select the model you want to deploy. You can automatically select the best model of your collection or choose one by hand => create a local big data contexif you encouter any problems, closeKNIME and delete all data from the folder/big_data/ and start overapply lable encoding andmanipulations=> keep thecustomer_numberh2o_list_of_models.csvAUC DESCkeep best modelvar_model_name_full^(.*score|customer_number).*$here goes the Spark datathat you have usedat the start of thes_601 workflow to encode the targets- to _customer_numbersimpulates existence of a customer number thatwould be needed to export the relevant data linespredictdefault.scored_datanvl_numeric.../model/spark_label_encode_sql.tableSQL - encode the labels with SQLspark_label_encoder../model/d_regex_spark_include_500.tableRegEx of remaining Columnsregex_include_string../model/nvl_numeric_sql.tableSQL - CAST the numeric variablestrainregex filter columnsapply_modelapply_modelcensus_income_test.parquetREFRESH TABLE #table#=> make sure the Sparkenvironment 'knows' about the tablemodel_base_location.tableRead the MOJOmodel.zipvar_model_name_full=> manually enter the selected modelreturn "StackedEnsemble_AllModels_AutoML_20220615_113230";local big datacontext create Spark SQL Query CSV Reader Sorter Row Filter Column Filter Table Rowto Variable Spark Column Rename Spark Column Filter Hive to Spark Column Rename DB SQL Executor Java Snippet(simple) Spark H2O MOJO Predictor(Classification) Spark to Hive DB Reader Table Rowto Variable Table Reader Table Rowto Variable Table Reader Table Rowto Variable Table Reader Merge Variables Spark Column Filter DB Table Creator DB Loader Parquet Reader Spark SQL Query Create File/FolderVariables Create File/FolderVariables H2O MOJO Reader Create File/FolderVariables Java EditVariable (simple)

Nodes

Extensions

Links