Parquet Reader

Reader for Parquet files. It reads either single files or all files in a given directory.

Options

Mode
Determine the mode how to select one or multiple files.
  • File: Select a single file.
  • Files in folders: Select a folder and apply filters to select files within it.
Source
The path to the file or folder to select.
Include Subfolders
Whether to include subfolders when selecting multiple files within a folder.
Filter by file extension
Enable filtering files by their extension (e.g. 'xlsx;xlsm').
File extensions
Semicolon-separated list of file extensions to include (e.g. 'xlsx;xlsm;xls'). Case-insensitive unless 'Case sensitive (extensions)' is enabled.
Case sensitive (extensions)
Treat the entered extensions as case sensitive when matching.
Filter by file name
Enable filtering by file name pattern with wildcards or regular expression.
File name filter pattern
Pattern for file name filtering. With type 'Wildcard', use '*' and '?'. With type 'Regex', enter a Java regular expression.
File name filter type
Choose how to interpret the file name pattern.
  • Wildcard: Enable using '*' and '?' as wildcards.
  • Regular Expression: Enable using a Java regular expression.
Case sensitive (names)
Make file name filtering case sensitive.
Include hidden files
Include hidden files in the selection.
Include special files
Include special file types (workflows etc).
Filter by folder name
Enable filtering of folders by name pattern before descending into them.
Folder name pattern
Pattern for folder name filtering. Note that the pattern is applied to the path relative to the specified root folder. Use '*' and '?' with filter type 'Wildcard'. With type 'Regex', enter a Java regular expression.
Folder name filter type
Choose how to interpret the folder name pattern.
  • Wildcard: Enable using '*' and '?' as wildcards.
  • Regular Expression: Enable using a Java regular expression.
Case sensitive (folders)
Make folder name filtering case sensitive.
Include hidden folders
Descend into folders that are hidden (if they otherwise pass filters).
Follow symlinks
Follow symbolic links while traversing folders (only relevant when selecting a folder).
If there are unsupported types
Files can contain columns with types that are not supported by this node, for example complex nested types.
  • Fail: If set, the node fails on files with unsupported column types
  • Ignore column: If set, the columns with unsupported column types are ignored.
If schema changes
Specifies the node behavior if the content of the configured file/folder changes between executions, i.e., columns are added/removed to/from the file(s) or their types change. The following options are available:
  • Fail: If set, the node fails if the column names in the file have changed. Changes in column types will not be detected.
  • Use new schema: If set, the node will compute a new table specification for the current schema of the file at the time when the node is executed. Note that the node will not output a table specification before execution and that it will not apply transformations, therefore the transformation tab is disabled.
How to combine columns
Specifies how to deal with reading multiple files in which not all column names are identical.
  • Fail if different: The node will fail if multiple files are read and not all files have the same column names.
  • Union: Any column that is part of any input file is considered. If a file is missing a column, it is filled up with missing values.
  • Intersection: Only columns that appear in all files are considered for the output table.
Append file path column
Select this box if you want to add a column containing the path of the file from which the row is read. The node will fail if adding the column with the provided name causes a name collision with any of the columns in the read table.
Enforce types
Controls how columns whose type changes are dealt with. If selected, the mapping to the KNIME type you configured is attempted. The node will fail if that is not possible. If unselected, the KNIME type corresponding to the new type is used.
Transformations
Use this option to modify the structure of the table. You can deselect each column to filter it out of the output table, use the arrows to reorder the columns, or change the column name or column type of each column. Note that the positions of columns are reset in the dialog if a new file or folder is selected. Whether and where to add unknown columns during execution is specified via the special row <any unknown new column>. It is also possible to select the type new columns should be converted to. Note that the node will fail if this conversion is not possible e.g. if the selected type is Integer but the new column is of type Double.

Input Ports

Icon
The file system connection.

Output Ports

Icon
The data table containing the data of the Parquet file.

Popular Predecessors

  • No recommendations found

Popular Successors

  • No recommendations found

Views

This node has no views

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.