Icon

s_​420_​spark_​h2o_​apply

Apply H2O.ai model to Big Data with KNIME and Spark

s_420 - this is how your daily production workflow could look like.
You have the stored lists and rules how to prepare the data and you have the H2O.ai model in MOJO format and you keep a customer number (or whatever ID you need) along with your prediction score so you can deliver the data.

remember to keep thecustomer number (or other ID) s_420 - this is how your daily production workflow could look like.You have the stored lists and rules how to prepare the data and you have the H2O.ai model inMOJO format and you keep a customer number (or whatever ID you need) along with yourprediction score so you can deliver the data. delivery of the results into your big data system, youcould configure that. The data is on the cluster forfuther usage of you download it and distribute it viaCSV or something. => accesseslocal big data fold../data/local_big_datah2o_list_of_models.csvapply lable encoding andmanipulations=> keep thecustomer_numberkeep best modelRead the MOJOmodelRead Variableimportancenvl_numeric_sql.tableSQL - CAST the numeric variables^(.*score|customer_number).*$- to _apply_modelapply_modelcensus_income_test.parquetcustomer_numbersimpulates existence of a customer number thatwould be needed to export the relevant data linespredictdefault.scored_datanvl_numeric_sql.tablespark_label_encode_sql.tableSQLstring for label encondingand numeric manipulationspark_label_encode_sql.tableSQL - encode the labels with SQLAUC DESCexcludenormalizedfrom the winning model local sparkcontext connect CSV Reader Spark SQL Query Row Filter Table Rowto Variable H2O MOJO Reader CSV Reader Table Reader Spark Column Rename Spark Column Filter Hive to Spark Column Rename DB SQL Executor DB Table Creator DB Loader Parquet Reader Java Snippet(simple) Spark H2O MOJO Predictor(Classification) Spark to Hive DB Reader Table Rowto Variable Table Rowto Variable Merge Variables Table Reader Sorter Rule-basedRow Filter Spark Column Filter Variable Importance remember to keep thecustomer number (or other ID) s_420 - this is how your daily production workflow could look like.You have the stored lists and rules how to prepare the data and you have the H2O.ai model inMOJO format and you keep a customer number (or whatever ID you need) along with yourprediction score so you can deliver the data. delivery of the results into your big data system, youcould configure that. The data is on the cluster forfuther usage of you download it and distribute it viaCSV or something. => accesseslocal big data fold../data/local_big_datah2o_list_of_models.csvapply lable encoding andmanipulations=> keep thecustomer_numberkeep best modelRead the MOJOmodelRead Variableimportancenvl_numeric_sql.tableSQL - CAST the numeric variables^(.*score|customer_number).*$- to _apply_modelapply_modelcensus_income_test.parquetcustomer_numbersimpulates existence of a customer number thatwould be needed to export the relevant data linespredictdefault.scored_datanvl_numeric_sql.tablespark_label_encode_sql.tableSQLstring for label encondingand numeric manipulationspark_label_encode_sql.tableSQL - encode the labels with SQLAUC DESCexcludenormalizedfrom the winning model local sparkcontext connect CSV Reader Spark SQL Query Row Filter Table Rowto Variable H2O MOJO Reader CSV Reader Table Reader Spark Column Rename Spark Column Filter Hive to Spark Column Rename DB SQL Executor DB Table Creator DB Loader Parquet Reader Java Snippet(simple) Spark H2O MOJO Predictor(Classification) Spark to Hive DB Reader Table Rowto Variable Table Rowto Variable Merge Variables Table Reader Sorter Rule-basedRow Filter Spark Column Filter Variable Importance

Nodes

Extensions

Links