Icon

07_​SparkSQL_​meets_​HiveQL

Spark SQL meets Hive SQL

This workflow builds a line plot of the age distribution for men and women in Maine (US) over the last 5 years.

In particular, women's data is processed via Hive SQL, and men's data via Spark SQL.

Will they blend? The whole data set is initially read from a Hadoop Hive installation.

.... and yes, Spark SQL and Hive SQL do blend!

Spark SQL meets Hive SQL.This workflow builds a line plot of the age distribution for men and women in Maine (US) over the last 5 years using both Spark SQL and KNIME DB nodes. Hive inDB Data Manipulation - On Female Records - Remove PWGTP* & PUMA* columns - Count number of records by AGEP Spark inDB Data Manipulation - On Male Records - Remove PWGTP* & PUMA* columns - Count number of records by AGEP SELECT * FROM #table# WHERE SEX = 1 (male)... and into KNIMECOUNT(*) FROM #table# BY AGEPfilling ageholesline plotrm PUMA*& PWGTP*convert a Hive queryinto a Spark RDDselect * fromss13pme tableonly Femalerecordsrm PUMA*& PWGTP*count recordsBY AGEP... and into KNIMEblend data Spark SQL Query Spark to Table Spark SQL Query Fix Missing Values Visualization Spark SQL Query Hive to Spark DB Table Selector Read Data IntoLocal Spark Env DB Row Filter DB Column Filter DB GroupBy DB Reader Joiner Spark SQL meets Hive SQL.This workflow builds a line plot of the age distribution for men and women in Maine (US) over the last 5 years using both Spark SQL and KNIME DB nodes. Hive inDB Data Manipulation - On Female Records - Remove PWGTP* & PUMA* columns - Count number of records by AGEP Spark inDB Data Manipulation - On Male Records - Remove PWGTP* & PUMA* columns - Count number of records by AGEP SELECT * FROM #table# WHERE SEX = 1 (male)... and into KNIMECOUNT(*) FROM #table# BY AGEPfilling ageholesline plotrm PUMA*& PWGTP*convert a Hive queryinto a Spark RDDselect * fromss13pme tableonly Femalerecordsrm PUMA*& PWGTP*count recordsBY AGEP... and into KNIMEblend dataSpark SQL Query Spark to Table Spark SQL Query Fix Missing Values Visualization Spark SQL Query Hive to Spark DB Table Selector Read Data IntoLocal Spark Env DB Row Filter DB Column Filter DB GroupBy DB Reader Joiner

Nodes

Extensions

Links