0 ×

Parquet Writer

StreamableKNIME BigData File Format Extension version 4.3.1.v202101261633 by KNIME AG, Zurich, Switzerland

This node supports the path flow variable. For further information about file handling in general see the File Handling Guide.

This node writes the KNIME data table into a Parquet file. Depending on the selected mode the node writes a single file or splits up the data into several files which are stored in the specified folder.

Options

Settings

General settings regarding the output file location and storage configuration.

Write to
Select a file system in which you want to store the file. There are four default file system options to choose from:
  • Local File System: Allows you to select a location in your local system.
  • Mountpoint: Allows you to write to a mountpoint. When selected, a new drop-down menu appears to choose the mountpoint. Unconnected mountpoints are greyed out but can still be selected (note that browsing is disabled in this case). Go to the KNIME Explorer and connect to the mountpoint to enable browsing. A mountpoint is displayed in red if it was previously selected but is no longer available. You won't be able to save the dialog as long as you don't select a valid i.e. known mountpoint.
  • Relative to: Allows you to choose whether to resolve the path relative to the current mountpoint, current workflow or the current workflow's data area. When selected a new drop-down menu appears to choose which of the three options to use.
  • Custom/KNIME URL: Allows to specify a URL (e.g. file://, http:// or knime:// protocol). When selected, a spinner appears that allows you to specify the desired connection and write timeout in milliseconds. In case it takes longer to connect to the host / write the file, the node fails to execute. Browsing is disabled for this option.
It is possible to use other file systems with this node. Therefore, you have to enable the file system connection input port of this node by clicking the ... in the bottom left corner of the node's icon and choose Add File System Connection port .
Afterwards, you can simply connect the desired connector node to this node. The file system connection will then be shown in the drop-down menu. It is greyed out if the file system is not connected in which case you have to (re)execute the connector node first. Note: The default file systems listed above can't be selected if a file system is provided via the input port.
Mode
Depending on the selected mode the node writes the input data into a single file or splits it up into several files of the defined size which are then stored in the specified folder.
File/URL
Enter a URL when writing to Custom/KNIME URL, otherwise enter a path to a file. The required syntax of a path depends on the chosen file system, such as "C:\path\to\file" (Local File System on Windows) or "/path/to/file" (Local File System on Linux/MacOS and Mountpoint). For file systems connected via input port, the node description of the respective connector node describes the required path format. You can also choose a previously selected file from the drop-down list, or select a location from the "Browse..." dialog. Note that browsing is disabled in some cases:
  • Custom/KNIME URL: Browsing is always disabled.
  • Mountpoint: Browsing is disabled if the selected mountpoint isn't connected. Go to the KNIME Explorer and connect to the mountpoint to enable browsing.
  • File systems provided via input port: Browsing is disabled if the connector node hasn't been executed since the workflow has been opened. (Re)execute the connector node to enable browsing.
The location can be exposed as or automatically set via a path flow variable.
Create missing folders
Select if the folders of the selected output location should be created if they do not already exist. If this option is unchecked, the node will fail if a folder does not exist.
If exists
Specify the behavior of the node in case the output file already exists.
  • Overwrite: Will replace any existing file.
  • Fail: Will issue an error during the node's execution (to prevent unintentional overwrite).
File Compression
The compression codec used to write the Parquet file.
Split data into files of size (MB)
Splits up the input data into files of the specified maximum size in megabytes. This option is only available if the folder mode is selected.
File name prefix
The prefix to use for the file within the selected folder. A running index is appended starting with 0 e.g. part_00000.parquet, part_00001.parquet. This option is only available if the folder mode is selected.
Within file row group size (MB)
Defines the maximum size of a row group within a file in megabyte. For more details see the Parquet documentation.

Type Mapping

Change the KNIME to Parquet type mapping configuration for subsequent nodes by selecting a Parquet type to the given KNIME Type. The dialog allows you to add new or change existing type mapping rules.

Mapping by Name
Columns that match the given name (or regular expression) and KNIME type will be mapped to the specified Parquet type.
Mapping by Type
Columns that match the given KNIME type will be mapped to the specified Parquet type.

Input Ports

Icon
The data table that should be written

Workflows

Installation

To use this node in KNIME, install KNIME Extension for Big Data File Formats from the following update site:

KNIME 4.3

A zipped version of the software site can be downloaded here.

You don't know what to do with this link? Read our NodePit Product and Node Installation Guide that explains you in detail how to install nodes to your KNIME Analytics Platform.

Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform. Browse NodePit from within KNIME, install nodes with just one click and share your workflows with NodePit Space.

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.