Icon

04_​Spark_​Writing

04 Spark Writing Exercise
Missing Values Strategy: 04_Spark_WritingToDB This workflow implements a predictor with Spark for the COW class based on the data rows with no missing COW values from the ss13pme data set.The workflow 1. reads the ss13pme table from Hive into Spark, 2. filters out uninteresting columns, 3. separates rows where COW is not null from rows where COW is null, 4. where COW is not null: fixes missing values and trains a decision tree,and 5. where COW is null: removes COW column, fixes missing values, then applies decision tree to predict COWNow to do: Export data to: KNIME table, Parquet file, HiveMake sure you have executed the /2_Hadoop/2_Exercises/00_Setup_Hive_Table workflow during your current KNIME session before running this workflow. Connect to Local Big DataEnvironmentfix missing valuesStart cow class from 0COW is NOT NULLCOW is NULLrm cowrm puma*& pwgtp*select * fromss13pme tableremove socp10 & socp12as featuresSpark Concatenate Create Local BigData Environment Spark Missing Value Modify cow class Spark Row Filter Spark Row Filter Spark Column Filter Spark MissingValue (Apply) Spark Column Filter DB Table Selector Hive to Spark Spark DecisionTree Learner Spark Predictor(Classification) Missing Values Strategy: 04_Spark_WritingToDB This workflow implements a predictor with Spark for the COW class based on the data rows with no missing COW values from the ss13pme data set.The workflow 1. reads the ss13pme table from Hive into Spark, 2. filters out uninteresting columns, 3. separates rows where COW is not null from rows where COW is null, 4. where COW is not null: fixes missing values and trains a decision tree,and 5. where COW is null: removes COW column, fixes missing values, then applies decision tree to predict COWNow to do: Export data to: KNIME table, Parquet file, HiveMake sure you have executed the /2_Hadoop/2_Exercises/00_Setup_Hive_Table workflow during your current KNIME session before running this workflow. Connect to Local Big DataEnvironmentfix missing valuesStart cow class from 0COW is NOT NULLCOW is NULLrm cowrm puma*& pwgtp*select * fromss13pme tableremove socp10 & socp12as featuresSpark Concatenate Create Local BigData Environment Spark Missing Value Modify cow class Spark Row Filter Spark Row Filter Spark Column Filter Spark MissingValue (Apply) Spark Column Filter DB Table Selector Hive to Spark Spark DecisionTree Learner Spark Predictor(Classification)

Nodes

Extensions

Links