XML Reader

Reads a well formed XML document. An XPath query can be specified to read only a portion of the file. In this case the output will be the nodes in the document that match the XPath query.

This node can access a variety of different file systems. More information about file handling in KNIME can be found in the official File Handling Guide.

Options

Type
The selection mode.
  • File: Select a single file.
  • Files in folders: Select a folder and apply filters to select files within it.
Source
The path to the file or folder to select.
Include subfolders
Whether to include subfolders when selecting multiple files within a folder.
Filter by file extension
Enable filtering files by their extension (e.g. 'xlsx;xlsm').
File extensions
Semicolon-separated list of file extensions to include (e.g. 'xlsx;xlsm;xls'). Case-insensitive unless 'Case sensitive (extensions)' is enabled.
Case sensitive (extensions)
Treat the entered extensions as case sensitive when matching.
Filter by file name
Enable filtering by file name pattern with wildcards or regular expression.
File name filter pattern
Pattern for file name filtering. With type 'Wildcard', use '*' and '?'. With type 'Regex', enter a Java regular expression.
File name filter type
Choose how to interpret the file name pattern: wildcard or regular expression.
Case sensitive (names)
Make file name filtering case sensitive.
Include hidden files
Include hidden files in the selection.
Include special files
Include special file types (workflows etc).
Filter by folder name
Enable filtering of folders by name pattern before descending into them.
Folder name pattern
Pattern for folder name filtering. Use '*' and '?' with filter type 'Wildcard'. With type 'Regex', enter a Java regular expression.
Folder name filter type
Choose how to interpret the folder name pattern: wildcard or regular expression.
Case sensitive (folders)
Make folder name filtering case sensitive.
Include hidden folders
Descend into folders that are hidden (if they otherwise pass filters).
Follow symlinks
Follow symbolic links while traversing folders (only relevant when selecting a folder).
Output column name
Name of the output column.
Use XPath filter
Enable XPath filtering to extract specific elements from the XML document.
XPath query
Only nodes of the document which match this XPath query will be read. Each matching node is read in a single data cell. Note that XPath requires to explicitly denote namespaces. E.g. to read only the body of a XHTML document you can use the XPath query: /dns:html/dns:body where dns is the prefix of the namespace defined in the Namespaces elements. A limited XPath syntax is supported. Only absolute paths to nodes can be defined. Among the XPath operators the |-Operator is supported. It can be used to read for example the head and the body of a XHTML document in single cells: /dns:html/dns:head | /dns:html/dns:body
Fail if XPath not found
If checked, execution will fail if no match is found for the given XPath in any of the files. If unchecked and not found, the result will be an empty table.
Namespaces
The prefixes and the namespaces used in the XPath query. For example when querying XHTML documents with the XPath Query:
//pre:h1
the following namespace must be defined:
Prefix: pre
Namespace: http://www.w3.org/1999/xhtml
  • Prefix: The namespace prefix.
  • Namespace: The namespace URI.
Incorporate namespace of the root element

This option is useful when you do not have the default namespace of your document at hand.

If checked, the namespace of the root element is added to the Namespaces table during runtime.

Prefix of root's namespace

Define a prefix for the root namespace in case it is incorporated.

For the example of XHTML documents the namespace of the root element is http://www.w3.org/1999/xhtml so that with the root's prefix of dns you can leavethe Namespaces table empty.

Skip first data rows
Use this option to skip the specified number of valid data rows. This has no effect on which row will be chosen as a column header. Skipping rows prevents parallel reading of individual files.
Limit number of rows
If enabled, only the specified number of data rows are read. The column header row (if selected) is not taken into account. Limiting rows prevents parallel reading of individual files.
Append file path column
Select this box if you want to add a column containing the path of the file from which the row is read. The node will fail if adding the column with the provided name causes a name collision with any of the columns in the read table.

Input Ports

Icon
The file system connection.

Output Ports

Icon
The complete XML document in a single data cell or the nodes matching the XPath query if XPath filtering is checked.

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.