Icon

02_​HDFS_​and_​File_​Handling_​Example

HDFS file handling

This workflow demonstrates the HDFS file handling capabilites using the file handling nodes in conjunction with an HDFS connection.

To run this workflow on a remote cluster, use an HDFS Connection node (available in the KNIME Big Data Connectors Extension) in place of the Create Local Big Data Environment node.

Requirements:
- KNIME File Handling Nodes
- KNIME Big Data Connectors Extension
- KNIME Extension for Local Big Data Environments

Setup:
The directory /tmp needs to exist in the HDFS file system and the user needs to have read/write rights to it.



Old method Upload csv file to HDFS file system Download csv file from HDFS file system HDFS File Handling This workflow demonstrates the HDFS file handling capabilites using the file handling nodes in conjunction with an HDFSconnection.For more information see the workflow metadata. Find it here: View -> Description Maybe just make a static file for this, readable from /datafolder in WF? New File Handling random datathe file namefrom HDFStest file contentdelete the remote filewrite fileto HDFSFiles onlyrandom dataWrite to fileNode 216List files on "Remoteserver"Read the downloaded fileCompare uploaded and downloaded files forconsistencyto "RemoteServer"Download from"RemoteServer"to local directory Data Generator Java EditVariable (simple) String to URI List RemoteFiles (legacy) Download (legacy) Create TempDir (legacy) CSV Reader(deprecated) Table DifferenceChecker Delete Files(legacy) CSV Writer(deprecated) Create TempDir (legacy) Upload (legacy) Create Local Big DataEnvironment (deprecated) Variable to TableRow (deprecated) Table Row to Variable(deprecated) Row Filter Table Row to Variable(deprecated) Create Local BigData Environment Data Generator CSV Writer Create TempPath/FIle List Files/Folders CSV Reader Table DifferenceFinder Transfer Files Transfer Files Old method Upload csv file to HDFS file system Download csv file from HDFS file system HDFS File Handling This workflow demonstrates the HDFS file handling capabilites using the file handling nodes in conjunction with an HDFSconnection.For more information see the workflow metadata. Find it here: View -> Description Maybe just make a static file for this, readable from /datafolder in WF? New File Handling random datathe file namefrom HDFStest file contentdelete the remote filewrite fileto HDFSFiles onlyrandom dataWrite to fileNode 216List files on "Remoteserver"Read the downloaded fileCompare uploaded and downloaded files forconsistencyto "RemoteServer"Download from"RemoteServer"to local directory Data Generator Java EditVariable (simple) String to URI List RemoteFiles (legacy) Download (legacy) Create TempDir (legacy) CSV Reader(deprecated) Table DifferenceChecker Delete Files(legacy) CSV Writer(deprecated) Create TempDir (legacy) Upload (legacy) Create Local Big DataEnvironment (deprecated) Variable to TableRow (deprecated) Table Row to Variable(deprecated) Row Filter Table Row to Variable(deprecated) Create Local BigData Environment Data Generator CSV Writer Create TempPath/FIle List Files/Folders CSV Reader Table DifferenceFinder Transfer Files Transfer Files

Nodes

Extensions

Links