04_HDI_Hive_Spark

All inside BigData Predictive Approach. From Hive through Spark ETL till Spark Model Training.

This workflow reads CENSUS data from a Hive database in HDInsight; it then moves to Spark where it performs some ETL operations; and finally it trains a Spark decision tree model to predict COW values based on all other attributes. Data for this example come from the new CENSUS dataset which is publicly available and can be downloaded from: http://www.census.gov/programs-surveys/acs/data/pums.html A full explanation of all attributes can be found in: http://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/PUMSDataDict15.pdf

Nodes

Spark Column Filter5 ×
Spark Column Rename3 ×
Spark to Table3 ×
Database Row Filter2 ×
Hive to Spark2 ×
Show all 20 nodes

Extensions

FeatureKNIME Base nodes

04_​HDI_​Hive_​Spark

Nodes

Extensions

Links

Download

04_HDI_Hive_Spark