Icon

17108_​Big_​Data_​Irish_​Meter_​on_​Spark_​only

Local_Big_Data_Irish_Meter
Local_Big_Data_Irish_MeterThis workflow uses a portion of the Irish Energy Meter dataset, and presents a simple analysis based on the whitepaper "Big Data, Smart Energy, and PredictiveAnalytics". It is intended to highlight KNIME's Big Data and Spark functionality in the 3.6 release.The workflow creates a Local Big Data Environment, loads the meter dataset to Hive, and then transfers it into Spark. It uses a series of Spark SQL nodes to createdatetime fields, and then uses Spark nodes to aggregate energy usage over these datetime fields. In the wrapped metanode, it performs PCA and k-means using Sparknodes, and does some simple visualizations of the clustered data. Finally, it writes the clustered data out to both Hive and Parquet formats. Read MeterDataPersist aggregate resultsto HDFS in Parquet formatCompute daily, day segmentpercentagescreate HDFScompatible pathcreate HDFScompatible pathcreate structureLoadtableNode 202Persist aggregate results to a Hive table File Reader Aggregations andtime series Extract date-timeattributes Spark to Parquet Spark SQL Query PCA, K-means,Scatter Plot Create Temp Dir String Manipulation(Variable) String Manipulation(Variable) Create Temp Dir Create Local BigData Environment DB Table Creator DB Loader Hive to Spark Spark to Hive Local_Big_Data_Irish_MeterThis workflow uses a portion of the Irish Energy Meter dataset, and presents a simple analysis based on the whitepaper "Big Data, Smart Energy, and PredictiveAnalytics". It is intended to highlight KNIME's Big Data and Spark functionality in the 3.6 release.The workflow creates a Local Big Data Environment, loads the meter dataset to Hive, and then transfers it into Spark. It uses a series of Spark SQL nodes to createdatetime fields, and then uses Spark nodes to aggregate energy usage over these datetime fields. In the wrapped metanode, it performs PCA and k-means using Sparknodes, and does some simple visualizations of the clustered data. Finally, it writes the clustered data out to both Hive and Parquet formats. Read MeterDataPersist aggregate resultsto HDFS in Parquet formatCompute daily, day segmentpercentagescreate HDFScompatible pathcreate HDFScompatible pathcreate structureLoadtableNode 202Persist aggregate results to a Hive table File Reader Aggregations andtime series Extract date-timeattributes Spark to Parquet Spark SQL Query PCA, K-means,Scatter Plot Create Temp Dir String Manipulation(Variable) String Manipulation(Variable) Create Temp Dir Create Local BigData Environment DB Table Creator DB Loader Hive to Spark Spark to Hive

Nodes

Extensions

Links