Icon

02_​Spark_​Preprocessing

02 Spark Preprocessing Exercise
Missing Values Strategy: 02_Spark_Preprocessing This workflow implements some data manipulation operations in Spark.The workflow: 1. connects to Hive to read ss13pme and ss13hme data set and transfers data to SparkNow to do: 1. Filter, join and aggregate data through Spark data manipulation nodesMake sure you have executed the /2_Hadoop/2_Exercises/00_Setup_Hive_Table workflow during your currentKNIME session before running this workflow. Spark Data Manipulation - Column Filter on ss13pme to remove PWGTP* & PUMA* columns - Joiner to join ss13pme and ss13hme on serial no - ss13pme: - Sorter on AGEP descending - SQL with "AS t LIMIT 10" - import into KNIME Spark Data Manipulation: On ss13pme do the following: - Column Filter to remove PWGTP* & PUMA* columns - Row Filter COW is NOT NULL - Row Filter COW is NULL & remove COW - GroupBy to calculate average AGEP for SEX groups Connect to Local Big DataEnvironmentselect * fromss13hme tableselect * fromss13pme tableconvert to Spark DataFrameconvert to Spark DataFrame Create Local BigData Environment DB Table Selector DB Table Selector Hive to Spark Hive to Spark Missing Values Strategy: 02_Spark_Preprocessing This workflow implements some data manipulation operations in Spark.The workflow: 1. connects to Hive to read ss13pme and ss13hme data set and transfers data to SparkNow to do: 1. Filter, join and aggregate data through Spark data manipulation nodesMake sure you have executed the /2_Hadoop/2_Exercises/00_Setup_Hive_Table workflow during your currentKNIME session before running this workflow. Spark Data Manipulation - Column Filter on ss13pme to remove PWGTP* & PUMA* columns - Joiner to join ss13pme and ss13hme on serial no - ss13pme: - Sorter on AGEP descending - SQL with "AS t LIMIT 10" - import into KNIME Spark Data Manipulation: On ss13pme do the following: - Column Filter to remove PWGTP* & PUMA* columns - Row Filter COW is NOT NULL - Row Filter COW is NULL & remove COW - GroupBy to calculate average AGEP for SEX groups Connect to Local Big DataEnvironmentselect * fromss13hme tableselect * fromss13pme tableconvert to Spark DataFrameconvert to Spark DataFrameCreate Local BigData Environment DB Table Selector DB Table Selector Hive to Spark Hive to Spark

Nodes

Extensions

Links