Icon

s_​430_​spark_​h2o_​apply_​normalized

Apply H2O.ai model to Big Data with KNIME and Spark

s_430 - this is how your daily production workflow could look like.
You have the stored lists and rules how to prepare the data and you have the H2O.ai model in MOJO format and you keep a customer number (or whatever ID you need) along with your prediction score so you can deliver the data.

Here you do the additional normalization of your values

remember to keep thecustomer number (or other ID) s_430 - this is how your daily production workflow could look like.You have the stored lists and rules how to prepare the data and you have theH2O.ai model in MOJO format and you keep a customer number (or whatever IDyou need) along with your prediction score so you can deliver the data.Here you do the additional normalization of your values delivery of the results into your big data system, youcould configure that. The data is on the cluster forfuther usage of you download it and distribute it viaCSV or something. => accesseslocal big data fold../data/local_big_datah2o_list_of_models.csvapply_model=> keep thecustomer_numberkeep best modelRead the MOJOmodelRead Variableimportancenvl_numeric_sql.tableSQL - CAST the numeric variables^(.*score|customer_number).*$- to _spark_normalizer.pmmlapply_model_normalizedapply_model_normalizedcensus_income_test.parquetcustomer_numbersimpulates existence of a customer number thatwould be needed to export the relevant data linespredictdefault.scored_data_normalizednvl_numeric_sql.tablespark_label_encode_sql.tableSQLstring for label encondingand numeric manipulationspark_label_encode_sql.tableSQL - encode the labels with SQLAUC DESCselectnormalizedfrom thewinning modelnormalize local sparkcontext connect CSV Reader Spark SQL Query Row Filter Table Rowto Variable H2O MOJO Reader CSV Reader Table Reader Spark Column Rename Spark Column Filter Hive to Spark Column Rename DB SQL Executor PMML Reader DB Table Creator DB Loader Parquet Reader Java Snippet(simple) Spark H2O MOJO Predictor(Classification) Spark to Hive DB Reader Table Rowto Variable Table Rowto Variable Merge Variables Table Reader Sorter Rule-basedRow Filter Spark Column Filter Variable Importance Spark TransformationsApplier remember to keep thecustomer number (or other ID) s_430 - this is how your daily production workflow could look like.You have the stored lists and rules how to prepare the data and you have theH2O.ai model in MOJO format and you keep a customer number (or whatever IDyou need) along with your prediction score so you can deliver the data.Here you do the additional normalization of your values delivery of the results into your big data system, youcould configure that. The data is on the cluster forfuther usage of you download it and distribute it viaCSV or something. => accesseslocal big data fold../data/local_big_datah2o_list_of_models.csvapply_model=> keep thecustomer_numberkeep best modelRead the MOJOmodelRead Variableimportancenvl_numeric_sql.tableSQL - CAST the numeric variables^(.*score|customer_number).*$- to _spark_normalizer.pmmlapply_model_normalizedapply_model_normalizedcensus_income_test.parquetcustomer_numbersimpulates existence of a customer number thatwould be needed to export the relevant data linespredictdefault.scored_data_normalizednvl_numeric_sql.tablespark_label_encode_sql.tableSQLstring for label encondingand numeric manipulationspark_label_encode_sql.tableSQL - encode the labels with SQLAUC DESCselectnormalizedfrom thewinning modelnormalize local sparkcontext connect CSV Reader Spark SQL Query Row Filter Table Rowto Variable H2O MOJO Reader CSV Reader Table Reader Spark Column Rename Spark Column Filter Hive to Spark Column Rename DB SQL Executor PMML Reader DB Table Creator DB Loader Parquet Reader Java Snippet(simple) Spark H2O MOJO Predictor(Classification) Spark to Hive DB Reader Table Rowto Variable Table Rowto Variable Merge Variables Table Reader Sorter Rule-basedRow Filter Spark Column Filter Variable Importance Spark TransformationsApplier

Nodes

Extensions

Links