0 ×

01_​Big_​Data_​Preprocessing_​Example

Workflow

Big Data preprocessing

This workflow demonstrates the usage of the DB nodes in conjunction with the Create Local Big Data Environment node, which is part of the KNIME Big Data Extension. This node, together with the DB nodes, allows complex data preprocessing without the need of manual SQL coding.

To run this workflow on a remote cluster, use an HDFS Connection node and Hive Connector node (available in the KNIME Big Data Connectors Extension) in place of the Create Local Big Data Environment node.

The table name is controlled by a workflow variable which can be altered via the context menu of the workflow in the KNIME explorer.

Requirements:
- KNIME File Handling Nodes
- KNIME Extension for Local Big Data Environments

HiveHadoopBig DataSQLin-database
Create Table Big Data Preprocessing Please run the branches below in order for best results: Create Table --> Process Data --> Drop Table.For more information see the workflow metadata. Find it here: View -> Description Process Data Drop Table Sample Dataset execution engineto tezdrop test table(execute last)select the created tablekeep only Iris-versicolorcalculatestatisticsremovecluster membershipjoin original datawith aggregated datasort byuniverse_0_0ascendingcreate the demo tableload thedemo table Data Generator Create Local BigData Environment DB SQL Executor DB Table Remover DB Table Selector DB Row Filter DB GroupBy DB Column Filter DB Joiner DB Reader DB Sorter DB Reader DB Reader DB Table Creator DB Loader Create Table Big Data Preprocessing Please run the branches below in order for best results: Create Table --> Process Data --> Drop Table.For more information see the workflow metadata. Find it here: View -> Description Process Data Drop Table Sample Dataset execution engineto tezdrop test table(execute last)select the created tablekeep only Iris-versicolorcalculatestatisticsremovecluster membershipjoin original datawith aggregated datasort byuniverse_0_0ascendingcreate the demo tableload thedemo table Data Generator Create Local BigData Environment DB SQL Executor DB Table Remover DB Table Selector DB Row Filter DB GroupBy DB Column Filter DB Joiner DB Reader DB Sorter DB Reader DB Reader DB Table Creator DB Loader

Download

Get this workflow from the following link: Download

Nodes

01_​Big_​Data_​Preprocessing_​Example consists of the following 28 nodes(s):

Plugins

01_​Big_​Data_​Preprocessing_​Example contains nodes provided by the following 4 plugin(s):