Icon

04_​Spark_​Writing

04 Spark Writing Exercise Solution
to KNIME Missing Values Strategy: 04_Spark_WritingToDB This workflow implements a predictor with Spark for the COW class based on the data rows with no missing COW values from the ss13pme data set.The workflow 1. reads the ss13pme table from Hive into Spark, 2. filters out uninteresting columns, 3. separates rows where COW is not null from rows where COW is null, 4. where COW is not null: fixes missing values and trains a decision tree (removing columns socp10 and socp12 as features), 5. where COW is null: removes COW column, fixes missing values, then applies decision tree to predict COW, and 6. exports data to: KNIME table, Parquet file, HiveMake sure you have executed the /2_Hadoop/2_Exercises/00_Setup_Hive_Table workflow during your current KNIME session before running this workflow. Connect to Local Big DataEnvironmentfix missing valuesStart cow class from 0back to KNIMEfrom SparkCOW is NOT NULLCOW is NULLrm cowrm puma*& pwgtp*select * fromss13pme tableremove socp10 & socp12as featuresback to KNIMEfrom HiveWrite directly to ParquetSpark Concatenate Create Local BigData Environment Spark Missing Value Modify cow class Spark to Table Spark Row Filter Spark Row Filter Spark Column Filter Spark MissingValue (Apply) Spark Column Filter DB Table Selector Hive to Spark Spark DecisionTree Learner Spark Predictor(Classification) Spark to Hive DB Reader Spark to Parquet to KNIME Missing Values Strategy: 04_Spark_WritingToDB This workflow implements a predictor with Spark for the COW class based on the data rows with no missing COW values from the ss13pme data set.The workflow 1. reads the ss13pme table from Hive into Spark, 2. filters out uninteresting columns, 3. separates rows where COW is not null from rows where COW is null, 4. where COW is not null: fixes missing values and trains a decision tree (removing columns socp10 and socp12 as features), 5. where COW is null: removes COW column, fixes missing values, then applies decision tree to predict COW, and 6. exports data to: KNIME table, Parquet file, HiveMake sure you have executed the /2_Hadoop/2_Exercises/00_Setup_Hive_Table workflow during your current KNIME session before running this workflow. Connect to Local Big DataEnvironmentfix missing valuesStart cow class from 0back to KNIMEfrom SparkCOW is NOT NULLCOW is NULLrm cowrm puma*& pwgtp*select * fromss13pme tableremove socp10 & socp12as featuresback to KNIMEfrom HiveWrite directly to ParquetSpark Concatenate Create Local BigData Environment Spark Missing Value Modify cow class Spark to Table Spark Row Filter Spark Row Filter Spark Column Filter Spark MissingValue (Apply) Spark Column Filter DB Table Selector Hive to Spark Spark DecisionTree Learner Spark Predictor(Classification) Spark to Hive DB Reader Spark to Parquet

Nodes

Extensions

Links