Excel Reader

This node reads Excel files (xlsx, xlsm, xlsb, and xls format). It can read a single or multiple files at the same time, however reading only one sheet per file. The supported Excel types that can be read in are string, number, boolean, date, and time but not pictures, diagrams, etc.

Options

Mode
Determine the mode how to select one or multiple files.
  • File: Select a single file.
  • Files in folders: Select a folder and apply filters to select files within it.
Source
The path to the file or folder to select.
Include Subfolders
Whether to include subfolders when selecting multiple files within a folder.
Filter by file extension
Enable filtering files by their extension (e.g. 'xlsx;xlsm').
File extensions
Semicolon-separated list of file extensions to include (e.g. 'xlsx;xlsm;xls'). Case-insensitive unless 'Case sensitive (extensions)' is enabled.
Case sensitive (extensions)
Treat the entered extensions as case sensitive when matching.
Filter by file name
Enable filtering by file name pattern with wildcards or regular expression.
File name filter pattern
Pattern for file name filtering. With type 'Wildcard', use '*' and '?'. With type 'Regex', enter a Java regular expression.
File name filter type
Choose how to interpret the file name pattern.
  • Wildcard: Enable using '*' and '?' as wildcards.
  • Regular Expression: Enable using a Java regular expression.
Case sensitive (names)
Make file name filtering case sensitive.
Include hidden files
Include hidden files in the selection.
Include special files
Include special file types (workflows etc).
Filter by folder name
Enable filtering of folders by name pattern before descending into them.
Folder name pattern
Pattern for folder name filtering. Note that the pattern is applied to the path relative to the specified root folder. Use '*' and '?' with filter type 'Wildcard'. With type 'Regex', enter a Java regular expression.
Folder name filter type
Choose how to interpret the folder name pattern.
  • Wildcard: Enable using '*' and '?' as wildcards.
  • Regular Expression: Enable using a Java regular expression.
Case sensitive (folders)
Make folder name filtering case sensitive.
Include hidden folders
Descend into folders that are hidden (if they otherwise pass filters).
Follow symlinks
Follow symbolic links while traversing folders (only relevant when selecting a folder).
File encryption
Allows you to specify a password to decrypt files that have been protected with a password via Excel. Files without password protection can also be read if you set a password.
  • No encryption: Files are not password protected.
  • Password: Files are protected with a password.
Password
The password to decrypt the Excel file.
Select sheet
Choose which sheet to read from the Excel file. The order of the sheets is the same as displayed in Excel (i.e. not necessarily a lexicographic order).
  • First sheet with data: The first sheet of the selected file(s) that contains data will be read in. Containing data means not being empty. If all sheets of a file are empty, an empty table is read in.
  • By name: The sheet with the selected name will be read in. If reading multiple files, the sheet names of the first file are shown and the node will fail if any of the other files does not contain a sheet with the selected name.
  • By position: The sheet at the selected position will be read in. If reading multiple files, the node will fail if any of the files does not contain a sheet at the selected position. The position starts at 0, i.e. the first sheet is at position 0.
Sheet name
The name of the sheet to read.
Sheet position
The position (starting at 0) of the sheet to read. The maximum position that can be selected depends on the number of sheets available in the first read file.
Define read area
Choose whether to read the entire sheet or a custom area.
  • Whole sheet: All the data contained in the sheet is read in. This includes areas where diagrams, borders, coloring, etc. are placed and could create empty rows or columns.
  • Range by column and row: Only the data in the specified area is read in. Both start and end columns/rows are inclusive. By leaving a field empty, the start or end of the area is not restricted.
Read from column
The first column to include specified as label ("A", "B", etc.) or number (starting at 1). Leave empty to read starting at the first column.
Read to column
The last column to include specified as label ("A", "B", etc.) or number (starting at 1). Leave empty to read until the last column.
Read from row
The first row number to include (starting at 1). Leave empty to read starting from the first row.
Read to row
The last row number to include (starting at 1). Leave empty to read until the last row.
Define Column Names
Choose how to define column names for the table.
  • Use custom row: Use a specific row as column names.
  • Excel headers (A, B, C): Use Excel column headers (A, B, C, ...) as column names.
  • Enumerate columns (Col1, Col2...): Use enumerated column names (Col1, Col2, ...) as column names.
Header as in row
The row number (1-based) containing column names.
Prefix for empty headers
Prefix to use for empty column headers.
Suffix for empty headers
Choose the suffix format for empty column headers.
  • A, B, ...: Use Excel column headers (A, B, C, ...) as column name suffix for empty columns.
  • 1, 2, ...: Use enumerated indices (1, 2, ...) as column name suffix for empty columns.
Define RowID
Choose how to define row IDs for the table.
  • Enumerate rows (Row1, Row2, ...): Generate row IDs by enumerating rows.
  • Use custom column: Use a specific column as row IDs.
RowID as in column
The column (e.g., A, B, C or 1, 2, 3) to use as row IDs. The RowIDs in the selected column must be unique, otherwise the execution fails.
Skip empty columns
If checked, empty columns of the sheet will be skipped and not displayed in the output. Whether a column is considered empty depends on the Column and Data Type Detection settings: If the cells of a column for all scanned rows were empty, the column is considered empty, even if the sheet contains a non-empty cell after the scanned rows. Removing the limit of scanned rows ensures empty and non-empty columns being detected correctly but also increases the time required for scanning.
Skip hidden columns
If checked, hidden columns of the sheet will be skipped and not displayed in the output.
Skip empty rows
If checked, empty rows of the sheet will be skipped and not displayed in the output.
Skip hidden rows
If checked, hidden rows of the sheet will be skipped and not displayed in the output.
Use Excel 15 digits precision
If checked, numbers are read in with 15 digits precision which is the same precision Excel is using to display numbers. This will prevent potential floating point issues. For most numbers, no difference can be observed if this option is unchecked.
Replace empty strings with missing values
If checked, empty strings (i.e. strings with only whitespaces) are replaced with missing values. This option is also applied to formulas that evaluate to strings.
Reevaluate formulas in all sheets
If checked, formulas are reevaluated and put into the created table instead of using the cached values. This can cause errors when there are functions that are not implemented by the underlying Apache POI library. Note: Files with xlsb format do not support this option. For xlsx and xlsm files, reevaluation requires significantly more memory as the whole file needs to be kept in memory (xls files are anyway loaded completely into memory).
On error insert
Choose how to handle formula evaluation errors.
  • String value: Insert the given String value. Inserting a string value causes the entire column to become a string column in case an error occurs.
  • Missing value: Insert a Missing value. Inserting a missing value is type innocent, but also unobtrusive.
String
The string value to insert into a cell with a formula evaluation error.
If schema changes, ignore schema change (deprecated)
If set, the node tries to ignore the changes and outputs a table with the old table specification. This option is deprecated. Once unchecked, it will disappear and cannot be selected again, as it may lead to invalid data in the resulting table.
If schema changes
Specifies the node behavior if the content of the configured file/folder changes between executions, i.e., columns are added/removed to/from the file(s) or their types change. The following options are available:
  • Fail: If set, the node fails if the column names in the file have changed. Changes in column types will not be detected.
  • Use new schema: If set, the node will compute a new table specification for the current schema of the file at the time when the node is executed. Note that the node will not output a table specification before execution and that it will not apply transformations, therefore the transformation tab is disabled.
Limit scanned rows
If you limit the number of scanned rows, only the specified number of input rows are used to analyze the file (i.e. to determine the column types). This option is recommended for long files where the first n rows are representative for the whole file.
Maximum number of scanned rows
The maximum number of rows to scan for determining column types.
Fail if schemas differ between multiple files
If checked, the node will fail if multiple files are read via the "Files in folder" option and not all files have the same table structure, i.e. the same columns.
How to combine columns
Specifies how to deal with reading multiple files in which not all column names are identical.
  • Fail if different: The node will fail if multiple files are read and not all files have the same column names.
  • Union: Any column that is part of any input file is considered. If a file is missing a column, it is filled up with missing values.
  • Intersection: Only columns that appear in all files are considered for the output table.
Append file path column
Select this box if you want to add a column containing the path of the file from which the row is read. The node will fail if adding the column with the provided name causes a name collision with any of the columns in the read table.
Enforce types
Controls how columns whose type changes are dealt with. If selected, the mapping to the KNIME type you configured is attempted. The node will fail if that is not possible. If unselected, the KNIME type corresponding to the new type is used.
Transformations
Use this option to modify the structure of the table. You can deselect each column to filter it out of the output table, use the arrows to reorder the columns, or change the column name or column type of each column. Note that the positions of columns are reset in the dialog if a new file or folder is selected. Whether and where to add unknown columns during execution is specified via the special row <any unknown new column>. It is also possible to select the type new columns should be converted to. Note that the node will fail if this conversion is not possible e.g. if the selected type is Integer but the new column is of type Double.

Input Ports

Icon
The file system connection.

Output Ports

Icon
The data table read from the Excel file.

Popular Successors

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.