Icon

03_​HDI_​Hive_​KNIME

Mix and Match Predictive Approach. From Hive through In-Database Processing till KNIMEAnalytics Model Training.

This workflow reads CENSUS data from a Hive database in HDInsight; it then performs some In-Database Processing on Hive; and finally it trains a KNIME decision tree model to predict COW values based on all other attributes. Data for this example come from the new CENSUS dataset which is publicly available and can be downloaded from: http://www.census.gov/programs-surveys/acs/data/pums.html A full explanation of all attributes can be found in: http://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/PUMSDataDict15.pdf

Mix and Match Predictive Approach. From Hive through In-Database Processing till KNIMEAnalytics Model Training.Goal: Predicting COW values.This workflow reads CENSUS data from a Hive database in HDInsight; it then performs some In-Database Processing on Hive; and finally it trains a KNIME decision tree model to predict COW values based onall other attributes.Data for this example come from the new CENSUS dataset which is publicly available and can be downloaded from: http://www.census.gov/programs-surveys/acs/data/pums.html A full explanation of all attributes can be found in: http://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/PUMSDataDict15.pdf Dataset used in thisexample isavailable here. COW is NULLCOW is not NULLimport all rows where COW is NOT NULLimport all rows whereCOW is NULLappend predictedCOW columnpredict COWselect * from ss13pmeremovePUMA* &PWGTP*Node 181removeCOWfile ss13pme.csvconnect toHive on HDInsightInsert <hostname> and <Credentials> here.Use of Credentials is recommended over simpleusername and password.Credentials are defined at the workflow levelRight-click the workflow in KNIME Explorer panel andselect "Workflow Credentials".Parameter field might need customization.Database Row Filter Database Row Filter Database ConnectionTable Reader Database ConnectionTable Reader Decision TreePredictor DecisionTree Learner Database TableSelector DatabaseColumn Filter Number To String DatabaseColumn Filter File Reader Hive Connector Mix and Match Predictive Approach. From Hive through In-Database Processing till KNIMEAnalytics Model Training.Goal: Predicting COW values.This workflow reads CENSUS data from a Hive database in HDInsight; it then performs some In-Database Processing on Hive; and finally it trains a KNIME decision tree model to predict COW values based onall other attributes.Data for this example come from the new CENSUS dataset which is publicly available and can be downloaded from: http://www.census.gov/programs-surveys/acs/data/pums.html A full explanation of all attributes can be found in: http://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/PUMSDataDict15.pdf Dataset used in thisexample isavailable here. COW is NULLCOW is not NULLimport all rows where COW is NOT NULLimport all rows whereCOW is NULLappend predictedCOW columnpredict COWselect * from ss13pmeremovePUMA* &PWGTP*Node 181removeCOWfile ss13pme.csvconnect toHive on HDInsightInsert <hostname> and <Credentials> here.Use of Credentials is recommended over simpleusername and password.Credentials are defined at the workflow levelRight-click the workflow in KNIME Explorer panel andselect "Workflow Credentials".Parameter field might need customization.Database Row Filter Database Row Filter Database ConnectionTable Reader Database ConnectionTable Reader Decision TreePredictor DecisionTree Learner Database TableSelector DatabaseColumn Filter Number To String DatabaseColumn Filter File Reader Hive Connector

Nodes

Extensions

Links