Icon

s_​405_​spark_​prepare_​data

use the stored rules and lists to actually prepare the data

apply the label encoding and other transformations stored in SQL code and the selected final column as RegEx string. Get the results back and export them to .parquet files so you could use them in a powerful R or Python environment. Of course you could also do the model building in Spark with a genuine Spark-ML model or H2O.ai Sparkling Water. All that matters is that the result is a MOJO file KNIME would be able to read and apply to Sparkling Water

s_405 - apply the label encoding and other transformations stored in SQL code and the selected final column as RegEx string. Get the results back and export them to .parquet filesso you could use them in a powerful R or Python environment. Of course you could also do the model building in Spark with a genuine Spark-ML model or H2O.ai Sparkling Water.All that matters is that the result is a MOJO file KNIME would be able to read and apply to Sparkling Water keep Target keep Target Of course you could also develop a model directly with H2O and Sparkling Water on your Big Datacluster, but the free version of H2o.ai AutoML is not there yet.Combine Big Data, Spark and H2O.ai Sparkling Waterhttps://hub.knime.com/mlauber71/space/kn_example_h2o_sparkling_water => accesseslocal big data fold../data/local_big_datacensus_income_train.parquetcensus_income_test.parquetspark_label_encode_sql.tableSQL - encode the labels with SQLtesttesttraintraintraintestdata_70_file.parquetdata_30_file.parquetHousekeepingnvl_numeric_sql.tableSQL - CAST the numeric variablesDROP TABLE IF EXISTS default.train;traintraintraintest- to _- to _DROP TABLE IF EXISTS default.test;testtestspark_normalizer.pmmltrainnormalizeddata_normalized_70_file.parquettrainnormalizeddata_normalized_30_file.parquettestnormalizedtestnormalizespark_label_encode_sql.tablenvl_numeric_sql.tabletraind_regex_spark_include_500.tableRegEx of remaining Columnsregex_include_stringtest local sparkcontext connect Parquet Reader Parquet Reader Table Reader Spark SQL Query Spark to Table Spark SQL Query Spark to Table Column Filter ReferenceColumn Filter Parquet Writer Parquet Writer Destroy SparkContext Table Reader DB SQL Executor DB Table Creator DB Loader Hive to Spark Hive to Spark Column Rename Column Rename DB SQL Executor DB Table Creator DB Loader PMML Reader Spark TransformationsApplier Parquet Writer Spark to Table Parquet Writer Spark to Table Spark TransformationsApplier Table Rowto Variable Table Rowto Variable Merge Variables Spark Column Filter Table Reader Table Rowto Variable Spark Column Filter s_405 - apply the label encoding and other transformations stored in SQL code and the selected final column as RegEx string. Get the results back and export them to .parquet filesso you could use them in a powerful R or Python environment. Of course you could also do the model building in Spark with a genuine Spark-ML model or H2O.ai Sparkling Water.All that matters is that the result is a MOJO file KNIME would be able to read and apply to Sparkling Water keep Target keep Target Of course you could also develop a model directly with H2O and Sparkling Water on your Big Datacluster, but the free version of H2o.ai AutoML is not there yet.Combine Big Data, Spark and H2O.ai Sparkling Waterhttps://hub.knime.com/mlauber71/space/kn_example_h2o_sparkling_water => accesseslocal big data fold../data/local_big_datacensus_income_train.parquetcensus_income_test.parquetspark_label_encode_sql.tableSQL - encode the labels with SQLtesttesttraintraintraintestdata_70_file.parquetdata_30_file.parquetHousekeepingnvl_numeric_sql.tableSQL - CAST the numeric variablesDROP TABLE IF EXISTS default.train;traintraintraintest- to _- to _DROP TABLE IF EXISTS default.test;testtestspark_normalizer.pmmltrainnormalizeddata_normalized_70_file.parquettrainnormalizeddata_normalized_30_file.parquettestnormalizedtestnormalizespark_label_encode_sql.tablenvl_numeric_sql.tabletraind_regex_spark_include_500.tableRegEx of remaining Columnsregex_include_stringtest local sparkcontext connect Parquet Reader Parquet Reader Table Reader Spark SQL Query Spark to Table Spark SQL Query Spark to Table Column Filter ReferenceColumn Filter Parquet Writer Parquet Writer Destroy SparkContext Table Reader DB SQL Executor DB Table Creator DB Loader Hive to Spark Hive to Spark Column Rename Column Rename DB SQL Executor DB Table Creator DB Loader PMML Reader Spark TransformationsApplier Parquet Writer Spark to Table Parquet Writer Spark to Table Spark TransformationsApplier Table Rowto Variable Table Rowto Variable Merge Variables Spark Column Filter Table Reader Table Rowto Variable Spark Column Filter

Nodes

Extensions

Links