Icon

12_​Incremental_​processing_​Parquet_​file

Incremental Data Processing with Parquet
Incremental Data Processing with ParquetBased on the NYC taxi dataset, this workflow analysis the available data per week and writes out the results of every week as separate Parquet file into a single folder. This folder could be registered as external table e.g. in Hive or Impala which will automatically make the latest results available. Example file system connectors that can be used with the marked nodes above via the dynamic input port Taxi datastored inthe workflowfile prefixEach iteration adds a new parquet file to the existing folderwith the year and week number as prefixTarget folder nameCheck thewritten Parquetfiles Parquet Reader GroupBy String Manipulation Group Loop Start Parquet Writer Variable Loop End Create File/FolderVariables List Files/Folders Google DriveConnector MicrosoftAuthentication MicrosoftAuthentication AmazonAuthentication HDFS Connector(KNOX) Google Authentication(API Key) Google CloudStorage Connector Azure Blob StorageConnector KNIME ServerConnector SSH Connector HTTP(S) Connector FTP Connector HDFS Connector GoogleAuthentication Create Local BigData Environment Create DatabricksEnvironment Amazon S3 Connector SharePointOnline Connector Incremental Data Processing with ParquetBased on the NYC taxi dataset, this workflow analysis the available data per week and writes out the results of every week as separate Parquet file into a single folder. This folder could be registered as external table e.g. in Hive or Impala which will automatically make the latest results available. Example file system connectors that can be used with the marked nodes above via the dynamic input port Taxi datastored inthe workflowfile prefixEach iteration adds a new parquet file to the existing folderwith the year and week number as prefixTarget folder nameCheck thewritten Parquetfiles Parquet Reader GroupBy String Manipulation Group Loop Start Parquet Writer Variable Loop End Create File/FolderVariables List Files/Folders Google DriveConnector MicrosoftAuthentication MicrosoftAuthentication AmazonAuthentication HDFS Connector(KNOX) Google Authentication(API Key) Google CloudStorage Connector Azure Blob StorageConnector KNIME ServerConnector SSH Connector HTTP(S) Connector FTP Connector HDFS Connector GoogleAuthentication Create Local BigData Environment Create DatabricksEnvironment Amazon S3 Connector SharePointOnline Connector

Nodes

Extensions

Links