Icon

s_​605_​spark_​prepare_​data

s_605 - use the stored rules and lists to actually prepare the data

s_605 - apply the label encoding and other transformations stored in SQL code and the selected final column as RegEx string

Get the results back and export them to .parquet files so you could use them in a powerful R or Python environment. (or leave them on the big data system). Of course you could also do the model building in Spark with a genuine Spark-ML model or H2O.ai Sparkling Water. All that matters is that the result is a MOJO file KNIME would be able to read and apply to Sparkling Water.

s_605 - apply the label encoding and other transformations stored in SQL code and the selected final column as RegEx stringGet the results back and export them to .parquet files so you could use them in a powerful R or Python environment. (or leave them on the big data system). Of course you could also do the model building in Spark with a genuine Spark-ML model or H2O.ai Sparkling Water. All that matters is that the result is a MOJO file KNIME would be able to read and apply to Sparkling Water. keep Target keep Target Of course you could also develop a model directly with H2O and Sparkling Water on your Big Data cluster, but the free version ofH2o.ai AutoML is not there yet.Combine Big Data, Spark and H2O.ai Sparkling Waterhttps://hub.knime.com/mlauber71/space/kn_example_h2o_sparkling_water => create a local big data contexif you encouter any problems, closeKNIME and delete all data from the folder/big_data/ and start overtesttesttraintraintraintestHousekeepingonce you are finished withSparkDROP TABLE IF EXISTS default.train;traintest- to _test- to _trainDROP TABLE IF EXISTS default.test;trainnormalizedtrainnormalizedtestnormalizedtestnormalizetrainregex filter columnstest../model/nvl_numeric_sql.tableSQL - CAST the numeric variablesnvl_numeric.../model/spark_label_encode_sql.tableSQL - encode the labels with SQLspark_label_encoder../model/d_regex_spark_include_500.tableRegEx of remaining Columnsregex_include_string../data/census_income_train.parquetthe training datatraintraintrain../data/census_income_train.parquetthe test data../model/spark_normalizer.pmmltesttesttestcustomer_numbersimpulates existence of a customer number thatwould be needed to export the relevant data linescustomer_numbersimpulates existence of a customer number thatwould be needed to export the relevant data lines../data/data_70_file.parquet../data/data_30_file.parquet../data/data_70_file.parquet../data/data_normalized_30_file.parquetlocal big datacontext create Spark SQL Query Spark to Table Spark SQL Query Spark to Table Column Filter ReferenceColumn Filter Destroy SparkContext DB SQL Executor Hive to Spark Hive to Spark Column Rename Column Rename DB SQL Executor Spark TransformationsApplier Spark to Table Spark to Table Spark TransformationsApplier Merge Variables Spark Column Filter Spark Column Filter Table Reader Table Rowto Variable Table Reader Table Rowto Variable Table Reader Table Rowto Variable Parquet Reader DB Table Creator DB Loader DB Table Selector Parquet Reader PMML Reader DB Loader DB Table Selector DB Table Creator Java Snippet(simple) Java Snippet(simple) Parquet Writer Parquet Writer Parquet Writer Parquet Writer s_605 - apply the label encoding and other transformations stored in SQL code and the selected final column as RegEx stringGet the results back and export them to .parquet files so you could use them in a powerful R or Python environment. (or leave them on the big data system). Of course you could also do the model building in Spark with a genuine Spark-ML model or H2O.ai Sparkling Water. All that matters is that the result is a MOJO file KNIME would be able to read and apply to Sparkling Water. keep Target keep Target Of course you could also develop a model directly with H2O and Sparkling Water on your Big Data cluster, but the free version ofH2o.ai AutoML is not there yet.Combine Big Data, Spark and H2O.ai Sparkling Waterhttps://hub.knime.com/mlauber71/space/kn_example_h2o_sparkling_water => create a local big data contexif you encouter any problems, closeKNIME and delete all data from the folder/big_data/ and start overtesttesttraintraintraintestHousekeepingonce you are finished withSparkDROP TABLE IF EXISTS default.train;traintest- to _test- to _trainDROP TABLE IF EXISTS default.test;trainnormalizedtrainnormalizedtestnormalizedtestnormalizetrainregex filter columnstest../model/nvl_numeric_sql.tableSQL - CAST the numeric variablesnvl_numeric.../model/spark_label_encode_sql.tableSQL - encode the labels with SQLspark_label_encoder../model/d_regex_spark_include_500.tableRegEx of remaining Columnsregex_include_string../data/census_income_train.parquetthe training datatraintraintrain../data/census_income_train.parquetthe test data../model/spark_normalizer.pmmltesttesttestcustomer_numbersimpulates existence of a customer number thatwould be needed to export the relevant data linescustomer_numbersimpulates existence of a customer number thatwould be needed to export the relevant data lines../data/data_70_file.parquet../data/data_30_file.parquet../data/data_70_file.parquet../data/data_normalized_30_file.parquetlocal big datacontext create Spark SQL Query Spark to Table Spark SQL Query Spark to Table Column Filter ReferenceColumn Filter Destroy SparkContext DB SQL Executor Hive to Spark Hive to Spark Column Rename Column Rename DB SQL Executor Spark TransformationsApplier Spark to Table Spark to Table Spark TransformationsApplier Merge Variables Spark Column Filter Spark Column Filter Table Reader Table Rowto Variable Table Reader Table Rowto Variable Table Reader Table Rowto Variable Parquet Reader DB Table Creator DB Loader DB Table Selector Parquet Reader PMML Reader DB Loader DB Table Selector DB Table Creator Java Snippet(simple) Java Snippet(simple) Parquet Writer Parquet Writer Parquet Writer Parquet Writer

Nodes

Extensions

Links