Excel Reader

This node reads Excel files (xlsx, xlsm, xlsb, and xls format). It can read a single or multiple files at the same time, however reading only one sheet per file. The supported Excel types that can be read in are string, number, boolean, date, and time but not pictures, diagrams, etc.
The data is read in and converted to the KNIME types string, integer, long, double, boolean, local date, local time, and local date&time. Upon execution, the node will scan the input file to determine number and types of the columns and output a table with the auto-guessed structure and KNIME types.
Formulas can also be read in and reevaluated if desired.The performance of this node is limited (due to the underlying library of the Apache POI project). Reading large files takes a long time and uses a lot of memory (especially files in xlsx format when formula reevaluation is enabled).

The dialog of this node shows a preview and the file content. While the preview shows the table that will be read in having the settings applied and is updated once a setting is changed, the file content shows the content of the file as it is displayed in Excel. This allows finding row numbers and column names easier to specify different settings.

This node can access a variety of different file systems. More information about file handling in KNIME can be found in the official File Handling Guide.

Options

File and Sheet

Read from
Select a file system which stores the data you want to read. There are four default file system options to choose from:
  • Local File System: Allows you to select a file/folder from your local system.
  • Mountpoint: Allows you to read from a mountpoint. When selected, a new drop-down menu appears to choose the mountpoint. Unconnected mountpoints are greyed out but can still be selected (note that browsing is disabled in this case). Go to the KNIME Explorer and connect to the mountpoint to enable browsing. A mountpoint is displayed in red if it was previously selected but is no longer available. You won't be able to save the dialog as long as you don't select a valid, i.e. known mountpoint.
  • Relative to: Allows you to choose whether to resolve the path relative to the current mountpoint, current workflow or the current workflow's data area. When selected a new drop-down menu appears to choose which of the three options to use.
  • Custom/KNIME URL: Allows to specify a URL (e.g. file://, http:// or knime:// protocol). When selected, a spinner appears that allows you to specify the desired connection and read timeout in milliseconds. In case it takes longer to connect to the host / read the file, the node fails to execute. Browsing is disabled for this option.
To read from other file systems, click on ... in the bottom left corner of the node icon followed by Add File System Connection port. Afterwards, connect the desired file system connector node to the newly added input port. The file system connection will then be shown in the drop-down menu. It is greyed out if the file system is not connected in which case you have to (re)execute the connector node first. Note: The default file systems listed above can't be selected if a file system is provided via the input port.
Mode
Select whether you want to read a single file or multiple files in a folder. When reading files in a folder, you can set filters to specify which files and subfolders to include (see below).
Filter options
Only displayed if the mode Files in folder is selected. Allows to specify which files should be included according to their file extension and/or name. It is also possible to include hidden files. The folder filter options enable you to specify which folders should be included based on their name and hidden status. Note that the folders themselves will not be included, only the files they contain.
Include subfolders
If this option is checked, the node will include all files from subfolders that satisfy the specified filter options. If left unchecked, only the files in the selected folder will be included and all files from subfolders are ignored.
File, Folder or URL
Enter a URL when reading from Custom/KNIME URL, otherwise enter a path to a file or folder. The required syntax of a path depends on the chosen file system, such as "C:\path\to\file" (Local File System on Windows) or "/path/to/file" (Local File System on Linux/MacOS and Mountpoint). For file systems connected via input port, the node description of the respective connector node describes the required path format. You can also choose a previously selected file/folder from the drop-down list, or select a location from the "Browse..." dialog. Note that browsing is disabled in some cases:
  • Custom/KNIME URL: Browsing is always disabled.
  • Mountpoint: Browsing is disabled if the selected mountpoint isn't connected. Go to the KNIME Explorer and connect to the mountpoint to enable browsing.
  • File systems provided via input port: Browsing is disabled if the connector node hasn't been executed since the workflow has been opened. (Re)execute the connector node to enable browsing.
The location can be exposed as or automatically set via a path flow variable.
Select Sheet
  • First with data: The first sheet of the selected file(s) that contains data will be read in. Containing data means not being empty. If all sheets of a file are empty, an empty table is read in.
  • By name: The sheet with the selected name will be read in. If reading multiple files, the sheet names of the first file are shown and the node will fail if any of the other files does not contain a sheet with the selected name.
  • By position: The sheet at the selected position will be read in. If reading multiple files, the node will fail if any of the files does not contain a sheet at the selected position. The position starts at 0, i.e. the first sheet is at position 0. The maximum position that can be selected depends on the number of sheets available in the first read file.
The order of the sheets is the same as displayed in Excel (i.e. not necessarily a lexicographic order).

Data Area

Read Area
  • Whole sheet: All the data contained in the sheet is read in. This includes areas where diagrams, borders, coloring, etc. are placed and could create empty rows or columns.
  • Range by row and column: Only the data in the specified area is read in. Both start and end columns/rows are inclusive. For columns, enter their labels ("A", "B", etc.) or numbers (starting at 1). For rows, enter their numbers (starting at 1). By leaving a field empty, the start or end of the area is not restricted.
Column Names
If 'Use values in row' is checked, a row can be selected that contains the column names. The first row in the sheet has number 1. The file content view can be helpful to find the correct row number when filtering empty or hidden rows or restricting the area that should be read. In case multiple files are read, only the column names of the first file will be used. Rows with the specified number in all other files are not read and will be ignored.
Skip
  • Empty rows: If checked, empty rows of the sheet will be skipped and not displayed in the output.
  • Empty columns: If checked, empty columns of the sheet will be skipped and not displayed in the output. Whether a column is considered empty depends on the Table specification settings. This means that if the cells of a column for all scanned rows were empty, the column is considered empty, even if the sheet contains a non-empty cell after the scanned rows. Removing the limit of scanned rows ensures empty and non-empty columns being detected correctly but also increases the time required for scanning.
  • Hidden rows: If checked, hidden rows of the sheet will be skipped and not displayed in the output.
  • Hidden columns: If checked, hidden columns of the sheet will be skipped and not displayed in the output.

Advanced

File encryption
Allows you to specify a password to decrypt files that have been protected with a password via Excel. Note: only files without password protection can be read. Credentials: use a password set via workflow credentials. Password: specify a password. Files without password protection can also be read if you set a password.
Limit scanned rows
If you limit the number of scanned rows, only the specified number of input rows are used to analyze the file (i.e to determine the column types). This option is recommended for long files where the first n rows are representative for the whole file.
Fail if schemas differ between multiple files
If checked, the node will fail if multiple files are read via the "Files in folder" option and not all files have the same table structure, i.e. the same columns.
When schema in file has changed
Specifies the node behavior if the content of the configured file/folder changes between executions, i.e., columns are added/removed to/from the file(s) or their types change. The following options are available:
  • Fail: If set, the node fails if the data schema in the file has changed.
  • Use new schema: If set, the node will compute a new table specification for the current schema of the file at the time when the node is executed. Note that the node will not output a table specification before execution and that it will not apply transformations, therefore the transformation tab is disabled.
  • Ignore (deprecated): If set, the node tries to ignore the changes and outputs a table with the old table specification. This option is deprecated and should never be selected for new workflows, as it may lead to invalid data in the resulting table. Use one of the other options instead.
Append file path colum
If checked, the node will append a path column with the provided name to the output table. This column contains for each row which file it was read from. The node will fail if adding the column with the provided name causes a name collision with any of the columns in the read table.
Data Area
  • RowID: If the 'Enumerate rows (Row0, Row1, ...)' option is selected RowIDs are generated starting with Row0, Row1, etc. Select the 'Use column' option to select a column that contains the RowIDs. Enter the label ("A", "B", etc.) or the number (starting at 1) of the column. The RowIDs in the selected column must be unique, otherwise the execution fails.
  • Column names: Select the 'Use Excel column names (A, B, C, ...)' option to generate column names based on the column names in Excel. Select the 'Enumerate columns (Col0, Col1, Col2, ...)' option to generate columns with an index and the "Col" prefix.
  • If the row that is selected in the column names section of the 'Data Area' tab contains missing values the empty column name prefix is used. Based on the selection of either the Excel column name or enumerate columns option it creates a column name with this prefix.
Values
  • Reevaluate formulas in all sheets: If checked, formulas are reevaluated and put into the created table instead of using the cached values. This can cause errors when there are functions that are not implemented by the underlying Apache POI library.
    On error either insert the given String value or a Missing value. Inserting a string value causes the entire column to become a string column in case an error occurs. Whereas inserting a missing value is type innocent, but also unobtrusive.
    Note: Files with xlsb format do not support this option. For xlsx and xlsm files, reevaluation requires significantly more memory as the whole file needs to be kept in memory (xls files are anyway loaded completely into memory).
  • Replace empty strings with missing values: If checked, empty strings (i.e. strings with only whitespaces) are replaced with missing values. This option is also applied to formulas that evaluate to strings.
  • Use Excel 15 digits precision: If checked, numbers are read in with 15 digits precision which is the same precision Excel is using to display numbers. This will prevent potential floating point issues. For most numbers, no difference can be observed if this option is unchecked.

Transformation

Transformations
This tab displays every column as a row in a table that allows modifying the structure of the output table. It supports reordering, filtering and renaming columns. It is also possible to change the type of the columns. Reordering is done via drag-and-drop. Just drag a column to the position it should have in the output table. Whether and where to add unknown columns during execution is specified via the special row <any unknown new column>. Note that the positions of columns are reset in the dialog if a new file or folder is selected.
Reset order
Resets the order of columns to the order in the input file/folder.
Reset filter
Clicking this button will reset the filters, i.e. all columns will be included.
Reset names
Resets the names to the names that are read from file or created if the file/folder doesn't contain column names.
Reset types
Resets the output types to the default types guessed from the input file/folder.
Reset all
Resets all transformations.
Enforce types
Controls how columns whose type changes are dealt with. If selected, we attempt to map to the KNIME type you configured and fail if that's not possible. If unselected, the KNIME type corresponding to the new type is used.
Take columns from
Only enabled in "Files in folder" mode. Specifies which set of columns are considered for the output table.
  • Union: Any column that is part of any input file is considered. If a file is missing a column, it's filled up with missing values.
  • Intersection: Only columns that appear in all files are considered for the output table.
NOTE: This setting has special implications if you are controlling the input location with a flow variable. If Intersection is selected any column that moves into the intersection during execution will be considered to be new, even if it was previously part of the union of columns. It is also important to note that the transformation matching during execution is based on name. That means if there was a column [A, Integer] during configuration in the dialog and this column becomes [A, String] during execution, then the stored transformation is applied to it. For filtering, ordering and renaming, this is straight forward. For type mapping the following is done: If there is an alternative converter to the specified KNIME type, then this converter is used, otherwise we default to the default KNIME type for the new type. In our example we might have specified that [A, Integer] should be mapped to Long. For the changed column [A, String] there is no converter to Long, so we default back to String and A becomes a String column in the output table.

Input Ports

Icon
The file system connection.

Output Ports

Icon
File being read with number and types of columns guessed automatically.

Popular Successors

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.