File Reader

Reads the most common text files. To auto-guess the structure of the file click the Autodetect format button. If you encounter problems with incorrect guessed data types disable the Limit data rows scanned option in the Advanced Settings tab. If the input file structure changes between different invocations, enable the Support changing file schemas option in the Advanced Settings tab. For further details see the KNIME File Handling Guide File Handling Guide.

Note: If you find that this node can't read your file, try the File Reader (Complex Format) node. It offers more options for reading complex files.

This node can access a variety of different file systems. More information about file handling in KNIME can be found in the official File Handling Guide.

Parallel reading: Individual files can be read in parallel if

  • They are located on the machine that is running this node.
  • They don't contain any quotes that contain row delimiters.
  • They are not gzip compressed.
  • No lines or rows are limited or skipped.
  • The file index is not prepended to the RowID.
  • They are not encoded with UTF-16 (UTF-16LE and UTF-16BE are fine).

Options

Settings

Read from
Select a file system which stores the data you want to read. There are four default file system options to choose from:
  • Local File System: Allows you to select a file/folder from your local system.
  • Mountpoint: Allows you to read from a mountpoint. When selected, a new drop-down menu appears to choose the mountpoint. Unconnected mountpoints are greyed out but can still be selected (note that browsing is disabled in this case). Go to the KNIME Explorer and connect to the mountpoint to enable browsing. A mountpoint is displayed in red if it was previously selected but is no longer available. You won't be able to save the dialog as long as you don't select a valid i.e. known mountpoint.
  • Relative to: Allows you to choose whether to resolve the path relative to the current mountpoint, current workflow or the current workflow's data area. When selected a new drop-down menu appears to choose which of the three options to use.
  • Custom/KNIME URL: Allows to specify a URL (e.g. file://, http:// or knime:// protocol). When selected, a spinner appears that allows you to specify the desired connection and read timeout in milliseconds. In case it takes longer to connect to the host / read the file, the node fails to execute. Browsing is disabled for this option.
To read from other file systems, click on ... in the bottom left corner of the node icon followed by Add File System Connection port. Afterwards, connect the desired file system connector node to the newly added input port. The file system connection will then be shown in the drop-down menu. It is greyed out if the file system is not connected in which case you have to (re)execute the connector node first. Note: The default file systems listed above can't be selected if a file system is provided via the input port.
Mode
Select whether you want to read a single file or multiple files in a folder. When reading files in a folder, you can set filters to specify which files and subfolders to include (see below).
Filter options
Only displayed if the mode Files in folder is selected. Allows to specify which files should be included according to their file extension and/or name. It is also possible to include hidden files. The folder filter options enable you to specify which folders should be included based on their name and hidden status. Note that the folders themselves will not be included, only the files they contain.
Include subfolders
If this option is checked, the node will include all files from subfolders that satisfy the specified filter options. If left unchecked, only the files in the selected folder will be included and all files from subfolders are ignored.
File, Folder or URL
Enter a URL when reading from Custom/KNIME URL, otherwise enter a path to a file or folder. The required syntax of a path depends on the chosen file system, such as "C:\path\to\file" (Local File System on Windows) or "/path/to/file" (Local File System on Linux/MacOS and Mountpoint). For file systems connected via input port, the node description of the respective connector node describes the required path format. You can also choose a previously selected file/folder from the drop-down list, or select a location from the "Browse..." dialog. Note that browsing is disabled in some cases:
  • Custom/KNIME URL: Browsing is always disabled.
  • Mountpoint: Browsing is disabled if the selected mountpoint isn't connected. Go to the KNIME Explorer and connect to the mountpoint to enable browsing.
  • File systems provided via input port: Browsing is disabled if the connector node hasn't been executed since the workflow has been opened. (Re)execute the connector node to enable browsing.
The location can be exposed as or automatically set via a path flow variable.
Autodetect format
By pressing this button, the "Column delimiter", "Row delimiter", "Quote char" and "Quote escape char" get automatically detected, though it is not guaranteed that the correct values are being detected.
Only a single file is considered for auto-detection, i.e., if "Files in folder" is selected only the first file is being used. The auto-detection by default is based on the first 1024 * 1024 characters of the selected file, but can be adjusted by clicking the settings button next to this option. The format can only be detected if the read number of characters comprises one full data row and the auto-detection will take at most 20 data rows into account. It is assumed that data rows are separated by line breaks. Note that the "Skip first lines" option as well as the specified "Comment char" will be used when guessing the file's format.
Column delimiter
The character string delimiting columns. Use '\t' for tab character. Can get detected automatically.
Row delimiter
The character string delimiting rows. Can get detected automatically.
  • Line break: Uses the line break character as row delimiter. This option is platform-agnostic.
  • Custom: Uses the provided string as row delimiter.
Quote char
The quote character. Can get detected automatically.
Quote escape char
The character is used for escaping quotes inside an already quoted value. Can get detected automatically.
Comment char
A character indicating line comments.
Has column header
Select this box if the first row contains column name headers.
Has RowID
Select this box if the first column contains RowIDs (no duplicates allowed).
Prepend file index to RowID
Select this box if you want to prepend a prefix that depends on the index of the source file to the RowIDs. The prefix for the first file is "File_0_", for the second "File_1_" and so on. This option is useful if the RowIDs within a single file are unique but the same RowIDs appear in multiple files. Prepending the file index prevents parallel reading of individual files.
Support short data rows
Select this box if some rows may be shorter than others (filled with missing values).

Transformation

Transformations
This tab displays every column as a row in a table that allows modifying the structure of the output table. It supports reordering, filtering and renaming columns. It is also possible to change the type of the columns. Reordering is done via drag-and-drop. Just drag a column to the position it should have in the output table. Whether and where to add unknown columns during execution is specified via the special row <any unknown new column>. Note that the positions of columns are reset in the dialog if a new file or folder is selected.
Reset order
Resets the order of columns to the order in the input file/folder.
Reset filter
Clicking this button will reset the filters i.e. all columns will be included.
Reset names
Resets the names to the names that are read from file or created if the file/folder doesn't contain column names.
Reset types
Resets the output types to the default types guessed from the input file/folder.
Reset all
Resets all transformations.
Enforce types
Controls how columns whose type changes are dealt with. If selected, we attempt to map to the KNIME type you configured and fail if that's not possible. If unselected, the KNIME type corresponding to the new type is used.
Take columns from
Only enabled in "Files in folder" mode. Specifies which set of columns are considered for the output table.
  • Union: Any column that is part of any input file is considered. If a file is missing a column, it's filled up with missing values.
  • Intersection: Only columns that appear in all files are considered for the output table.
NOTE: This setting has special implications if you are controlling the input location with a flow variable. If Intersection is selected any column that moves into the intersection during execution will be considered to be new, even if it was previously part of the union of columns. It is also important to note that the transformation matching during execution is based on name. That means if there was a column [A, Integer] during configuration in the dialog and this column becomes [A, String] during execution, then the stored transformation is applied to it. For filtering, ordering and renaming, this is straight forward. For type mapping the following is done: If there is an alternative converter to the specified KNIME type, then this converter is used, otherwise we default to the default KNIME type for the new type. In our example we might have specified that [A, Integer] should be mapped to Long. For the changed column [A, String] there is no converter to Long, so we default back to String and A becomes a String column in the output table.

Advanced Settings

Limit memory per column
If selected the memory per column is restricted to 1MB in order to prevent memory exhaustion. Uncheck this option to disable these memory restrictions.
Maximum number of columns
Sets the number of allowed columns (default 8192 columns) to prevent memory exhaustion. The node will fail if the number of columns exceeds the set limit.
Quote options
  • Remove quotes and trim whitespaces: Quotes will be removed from the value followed by trimming any leading/trailing whitespaces.
  • Keep quotes: The quotes of a value will be kept. Note: No trimming will be done inside the quotes.
Replace empty quoted strings with missing values
Select this box if you want quoted empty strings to be replaced by missing value cells.
Quotes contain no row delimiters
Check this box if there are no quotes that contain row delimiters inside the files. The absence of row delimiters inside of quotes is necessary for parallel reading of individual files.
Limit data rows scanned
If enabled, only the specified number of input rows are used to analyze the file (i.e to determine the column types). This option is recommended for long files where the first n rows are representative for the whole file. The "Skip first data rows" option has no effect on the scanning. Note also, that this option and the "Limit data rows" option are independent from each other, i.e., if the value in "Limit data rows" is smaller than the value specified here, we will still read as many rows as specified here.
When schema in file has changed
Specifies the node behavior if the content of the configured file/folder changes between executions, i.e., columns are added/removed to/from the file(s) or their types change. The following options are available:
  • Fail: If set, the node fails if the column names in the file have changed. Changes in column types will not be detected.
  • Use new schema: If set, the node will compute a new table specification for the current schema of the file at the time when the node is executed. Note that the node will not output a table specification before execution and that it will not apply transformations, therefore the transformation tab is disabled.
  • Ignore (deprecated): If set, the node tries to ignore the changes and outputs a table with the old table specification. This option is deprecated and should never be selected for new workflows, as it may lead to invalid data in the resulting table. Use one of the other options instead.
Number format
Allows to specify the thousands and decimal separator character for parsing numbers. The thousands separator is used for integer, long and double parsing, while the decimal separator is only used for the parsing of double values. Note that the two must differ. While it is possible to leave the thousands separator unspecified, you must always provide a decimal separator.
Fail if specs differ
If checked, the node will fail if multiple files are read via the Files in folder option and not all files have the same table structure i.e. the same columns.
Path column
If checked, the node will append a path column with the provided name to the output table. This column contains for each row which file it was read from. The node will fail if adding the column with the provided name causes a name collision with any of the columns in the read table.

Limit Rows

Skip first lines
If enabled, the specified number of lines are skipped in the input file before the parsing starts. Use this option to skip lines that do not fit in the table structure (e.g. multi-line comments). Skipping lines prevents parallel reading of individual files.
Skip first data rows
If enabled, the specified number of valid data rows are skipped. This has no effect on which row will be chosen as a column header. Skipping rows prevents parallel reading of individual files.
Limit data rows
If enabled, only the specified number of data rows are read. The column header row (if selected) is not taken into account. Limiting rows prevents parallel reading of individual files.

Encoding

Encoding
To read a file that contains characters in a different encoding, you can select the character set in this tab (UTF-8, UTF-16, etc.), or specify any other encoding supported by your Java VM. The default value uses the default encoding of the Java VM, which may depend on the locale or the Java property "file.encoding"

Input Ports

Icon
The file system connection.

Output Ports

Icon
File being read with number and types of columns guessed automatically.

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.