This node reads a spread sheet and provides it at its output port.
It reads only data from one sheet, but it is possible to read multiple files that contain the same sheet name at once.
It can read only numeric, date, boolean and string data but, of course, no diagrams, pictures, or other items.
Currently supported KNIME types are String, TimeAndDate, Double, Boolean, and Int.
It reads in the data from the sheet and sets a type for all columns that
is compatible with the data in that column (in the worst case "String" covers all).
The performance of the reader node is limited (due to the underlying library
of the Apache POI project). Reading large files takes a very long time and uses a
lot of memory (especially files in xlsx format when formula reevaluation is selected).
Options
File or Files in folder
Select whether you want to read a file or multiple files in a folder.
When reading files in a folder, you can set filters to specify which
files to include (see below).
Read from...
Select a file system which stores the data you want to read. There are
three default file system options to choose from:
Local File System: Allows you to select a file/folder from your
local system.
Mountpoint: Allows you to choose a mountpoint. When
selected, a new drop-down menu appears to choose the
mountpoint.
Custom URL: Allows to specify a URL (e.g. file://, http://
or knime:// protocol). Browsing is disabled for this option.
To read from other file systems, click on
... in the bottom left corner of the node icon followed by
Add File System Connection port. Afterwards, connect the desired file system connection
node to the newly added input port. The file system connection will then be shown in the drop
down menu.
File, Folder or URL
Enter a URL when reading from Custom URL, otherwise enter a path to a file or folder. The
required syntax of a path depends on the chosen file system, such as "C:\path\to\file" (Local
File System on Windows) or "/path/to/file" (Local
File System on Linux/MacOS and Mountpoint). For file systems connected via input port, the node
description of the respective connector node describes the required path format.
You can also choose a previously selected file/folder from the drop-down list, or select a destination from the "Browse..." dialog
(browsing is not available for the Custom URL file system).
Include subfolders
When reading files from a folder, check this to include all matching files from subfolders of the defined folder.
Filter options
Press this button for file filtering options.
File filtering: File extension(s)
Enter file extensions used to filter files.
Additionally, there is an option to choose whether the filter should work in a case sensitive or insensitive manner. Note: A file extension has to be entered without the preceding dot. Multiple extensions have to be separated by a semicolon character (e.g. txt;log).
File filtering: File names
Select the desired filter method (regular expression or wildcard) and enter a valid expression to match the filenames of the files to be read.
Additionally, there is an option to choose whether the filter should work in a case sensitive or insensitive manner.
File filtering options: Filter hidden files
Select if hidden files should be excluded.
Sheet to Read
After selecting a file, you can select the sheet from the
available sheets in the file. If a folder is selected, the sheets of the first file in the folder are displayed.
Connect timeout[s]
Timeout in seconds when reading remote files. 0 means no
timeout. In case it takes longer to connect to the host, the node fails to execute. Note: This option is only active in case that the Custom URL file system is selected. Other file
system don't use a timeout or the timeout is set directly in the specific connector node.
Column Names
If you want to use column names from the spread sheet, check
"Table contains column names" and enter the number
of the row that contains the column names (enter the number
(starting at one), not the index). Attention: In case multiple files are read, only the column names of the first file will be used.
Rows with the specified index in all other files are not read and will be ignored.
If this option is not checked, default names are generated
("Col0", "Col1", etc.).
Row IDs
The option "Generate RowIDs (index incrementing, starting with 'Row0')"
generates row ids starting from Row0 and produces all integer numbers till the end.
The option "Generate RowIDs (index as per sheet content, skipped rows will increment index)"
generates row ids based on sheet row indices starting from Row0,
but numbers might get skipped when the row is skipped.
If you want to use row IDs from the spread sheet, check
"Table contains row IDs" and provide the column that
contains the row ids. Enter the label of the column ("A",
"B", etc.) or the number (starting at one) of the column.
The rowIDs in the sheet must be unique, otherwise the execution fails.
If you check "Make row IDs unique", the node will append
a suffix to duplicates, ensuring row ID uniqueness.
For very large data sets this could cause memory problems.
Select the columns and rows to read
Specify the region of the data sheet that should be read in. If you
check "Read entire data sheet" the region provided by the
data sheet file will be read in. This includes areas where diagrams,
borders, coloring, etc. are placed and could create quite some empty
rows or columns (see skip empty rows and columns options below).
If you want to read in a fixed area, remove the check mark and enter
the first and last column, and first and last row to read. (Last
row and column is optional, causing it to read to the last row or
column provided by the sheet). For columns
you need to enter the label ("A", "B", etc.),
for rows enter the number (one-based).
Skip Empty Rows or Columns
If empty rows or columns should be removed from the result Data Table,
check the appropriate option.
Skip hidden columns
Hidden columns are removed when checked.
Evaluation Error Handling
Specify the data that is inserted in a cell with an error. Not all
formulas supported by all spreadsheet applications are supported by
the XLS Reader Node. If an error occurs during formula evaluation
(or if the spreadsheet contains an error cell), the selected data is
inserted. You can either choose to insert a cell representing a
missing value, or to insert a certain string pattern. A pattern
causes the entire column to become a string column in case an
error occurs. A missing value is type innocent, but also unobtrusive.
Reevaluate formulas
When checked it reads the whole file into memory and reevaluates
the formulas before returning the result. This can cause errors
when there are functions -not implemented by the Apache POI library-
used on the sheet. When unchecked, xls files are still read to the memory,
but the cached values are returned. For xlsx files, only the requested
sheet is loaded when no reevaluation is requested and requires much
less memory to return the cached values. If uncertain leave unchecked unless
there is a concrete reason.
Disable Preview
Preview loading is disabled and no data table configuration is computed, so
changes in the format of the xls(x) file will not generate errors after execution.
The drawback of the enabled state of this option is that subsequent nodes
do not have information about the output table structure.
Preview
The "Preview" tab shows you the output table with the current
settings in the dialog. If the settings are invalid an
error message will be displayed in this tab. The table updates only
when the "refresh" button is clicked. A warning is shown,
if the table content is out of sync with the current settings.
File Content
The "File Content" table shows you the content of the
currently selected sheet. It shows the entire content (no settings
are applied there). The column names here are the ones
that need to be specified in the corresponding setting fields, row numbers are 0-based.
To use this node in KNIME, install
KNIME Excel Support
from the following update site:
KNIME 4.3
A zipped version of the software site can be downloaded
here.
You don't know what to do with this link? Read our
NodePit Product and Node Installation Guide
that explains you in detail how to install nodes to your KNIME Analytics
Platform.
Wait a sec! You want to explore and install nodes even
faster? We highly recommend our
NodePit for KNIME
extension for your KNIME Analytics Platform. Browse NodePit from
within KNIME, install nodes with just one click and share your
workflows with NodePit Space.
Developers
You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.
Contact
Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well?
Do you think, the search results could be improved or something is missing?
Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com,
follow @NodePit on Twitter,
or chat on Gitter!
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.