Icon

01_​Big_​Data_​Preprocessing_​Example

Big Data preprocessing

This workflow demonstrates the usage of the DB nodes in conjunction with the Create Local Big Data Environment node, which is part of the KNIME Big Data Extension. This node, together with the DB nodes, allows complex data preprocessing without the need of manual SQL coding.

To run this workflow on a remote cluster, use an HDFS Connection node and Hive Connector node (available in the KNIME Big Data Connectors Extension) in place of the Create Local Big Data Environment node.

The table name is controlled by a workflow variable which can be altered via the context menu of the workflow in the KNIME explorer.

Requirements:
- KNIME File Handling Nodes
- KNIME Extension for Local Big Data Environments

Create Table Big Data Preprocessing Please run the branches below in order for best results: Create Table --> Process Data --> Drop Table.For more information see the workflow metadata. Find it here: View -> Description Process Data Drop Table Sample Dataset execution engineto tezdrop test table(execute last)select the created tablekeep only Iris-versicolorcalculatestatisticsremovecluster membershipjoin original datawith aggregated datasort byuniverse_0_0ascendingcreate the demo tableload thedemo table Data Generator DB SQL Executor DB Table Remover DB Table Selector DB Row Filter DB GroupBy DB Column Filter DB Joiner DB Reader DB Sorter DB Reader DB Reader Create Local BigData Environment DB Table Creator DB Loader Create Table Big Data Preprocessing Please run the branches below in order for best results: Create Table --> Process Data --> Drop Table.For more information see the workflow metadata. Find it here: View -> Description Process Data Drop Table Sample Dataset execution engineto tezdrop test table(execute last)select the created tablekeep only Iris-versicolorcalculatestatisticsremovecluster membershipjoin original datawith aggregated datasort byuniverse_0_0ascendingcreate the demo tableload thedemo tableData Generator DB SQL Executor DB Table Remover DB Table Selector DB Row Filter DB GroupBy DB Column Filter DB Joiner DB Reader DB Sorter DB Reader DB Reader Create Local BigData Environment DB Table Creator DB Loader

Nodes

Extensions

Links