Icon

03_​Spark_​Modelling

03 Spark Modelling Exercise Solution
Missing Values Strategy: 03_Spark_Modelling This workflow implements a predictor with Spark for the COW class based on the data rows with no missing COW values from the ss13pme data set.The workflow 1. reads the ss13pme table from Hive into Spark, 2. filters out uninteresting columns, 3. separates rows where COW is not null from rows where COW is null, 4. where COW is not null: fixes missing values and trains a decision tree with the socp10 and socp12 columns ignored, and 5. where COW is null: removes COW column, fixes missing values, then applies decision tree to predict COWMake sure you have executed the /2_Hadoop/2_Exercises/00_Setup_Hive_Table workflow during your current KNIME session before running this workflow. Connect to Local Big DataEnvironmentfix missing valuesStart cow class from 0rm puma*& pwgtp*COW is NOT NULLCOW is NULLrm cowselect * fromss13pme tableNode 253remove socp10 & socp12as featuresCreate Local BigData Environment Spark Missing Value Modify cow class Spark Column Filter Spark Row Filter Spark Row Filter Spark Column Filter Spark MissingValue (Apply) DB Table Selector Hive to Spark Spark Predictor(Classification) Spark DecisionTree Learner Missing Values Strategy: 03_Spark_Modelling This workflow implements a predictor with Spark for the COW class based on the data rows with no missing COW values from the ss13pme data set.The workflow 1. reads the ss13pme table from Hive into Spark, 2. filters out uninteresting columns, 3. separates rows where COW is not null from rows where COW is null, 4. where COW is not null: fixes missing values and trains a decision tree with the socp10 and socp12 columns ignored, and 5. where COW is null: removes COW column, fixes missing values, then applies decision tree to predict COWMake sure you have executed the /2_Hadoop/2_Exercises/00_Setup_Hive_Table workflow during your current KNIME session before running this workflow. Connect to Local Big DataEnvironmentfix missing valuesStart cow class from 0rm puma*& pwgtp*COW is NOT NULLCOW is NULLrm cowselect * fromss13pme tableNode 253remove socp10 & socp12as featuresCreate Local BigData Environment Spark Missing Value Modify cow class Spark Column Filter Spark Row Filter Spark Row Filter Spark Column Filter Spark MissingValue (Apply) DB Table Selector Hive to Spark Spark Predictor(Classification) Spark DecisionTree Learner

Nodes

Extensions

Links