XML Reader

Reads a well formed XML document. An XPath query can be specified to read only a portion of the file. In this case the output will be the nodes in the document that match the XPath query.

This node can access a variety of different file systems. More information about file handling in KNIME can be found in the official File Handling Guide.

Options

Settings

Read from
Select a file system which stores the data you want to read. There are four default file system options to choose from:
  • Local File System: Allows you to select a file/folder from your local system.
  • Mountpoint: Allows you to read from a mountpoint. When selected, a new drop-down menu appears to choose the mountpoint. Unconnected mountpoints are greyed out but can still be selected (note that browsing is disabled in this case). Go to the KNIME Explorer and connect to the mountpoint to enable browsing. A mountpoint is displayed in red if it was previously selected but is no longer available. You won't be able to save the dialog as long as you don't select a valid i.e. known mountpoint.
  • Relative to: Allows you to choose whether to resolve the path relative to the current mountpoint, current workflow or the current workflow's data area. When selected a new drop-down menu appears to choose which of the three options to use.
  • Custom/KNIME URL: Allows to specify a URL (e.g. file://, http:// or knime:// protocol). When selected, a spinner appears that allows you to specify the desired connection and read timeout in milliseconds. In case it takes longer to connect to the host / read the file, the node fails to execute. Browsing is disabled for this option.
To read from other file systems, click on ... in the bottom left corner of the node icon followed by Add File System Connection port. Afterwards, connect the desired file system connector node to the newly added input port. The file system connection will then be shown in the drop-down menu. It is greyed out if the file system is not connected in which case you have to (re)execute the connector node first. Note: The default file systems listed above can't be selected if a file system is provided via the input port.
Mode
Select whether you want to read a single file or multiple files in a folder. When reading files in a folder, you can set filters to specify which files and subfolders to include (see below).
Filter options
Only displayed if the mode Files in folder is selected. Allows to specify which files should be included according to their file extension and/or name. It is also possible to include hidden files. The folder filter options enable you to specify which folders should be included based on their name and hidden status. Note that the folders themselves will not be included, only the files they contain.
Include subfolders
If this option is checked, the node will include all files from subfolders that satisfy the specified filter options. If left unchecked, only the files in the selected folder will be included and all files from subfolders are ignored.
File, Folder or URL
Enter a URL when reading from Custom/KNIME URL, otherwise enter a path to a file or folder. The required syntax of a path depends on the chosen file system, such as "C:\path\to\file" (Local File System on Windows) or "/path/to/file" (Local File System on Linux/MacOS and Mountpoint). For file systems connected via input port, the node description of the respective connector node describes the required path format. You can also choose a previously selected file/folder from the drop-down list, or select a location from the "Browse..." dialog. Note that browsing is disabled in some cases:
  • Custom/KNIME URL: Browsing is always disabled.
  • Mountpoint: Browsing is disabled if the selected mountpoint isn't connected. Go to the KNIME Explorer and connect to the mountpoint to enable browsing.
  • File systems provided via input port: Browsing is disabled if the connector node hasn't been executed since the workflow has been opened. (Re)execute the connector node to enable browsing.
The location can be exposed as or automatically set via a path flow variable.
Output column name
Name of the output column
XPath query

Only nodes of the document which match this XPath query will be read. Each matching node is read in a single data cell.

Note that XPath requires to explicitly denote namespaces. E.g. to read only the body of a XHTML document you can use the XPath query:
/dns:html/dns:body
where dns is the prefix of the namespace defined in the Namespaces table.

A limited XPath syntax is supported. Only absolute paths to nodes can be defined. Among the XPath operators the |-Operator is supported. It can be used to read for example the head and the body of a XHTML document in single cells:
/dns:html/dns:head | /dns:html/dns:body

Fail if XPath not found
If checked, execution will fail if no match is found for the given XPath in any of the files. If unchecked and not found, the result will be an empty table.
Namespaces
The prefixes and the namespaces used in the XPath query. For the example in XPath query following namespace must be defined:
Prefix: dns
Namespace: http://www.w3.org/1999/xhtml
Incorporate namespace of the root element

This option is useful when you do not have the default namespace of your document at hand.

If checked, the namespace of the root element is added to the Namespaces table during runtime. Please define a prefix for this namespace in Prefix of root element's namespace.

For the example of XHTML documents the namespace of the root element is http://www.w3.org/1999/xhtml so that with the root's prefix of dns you can leave the Namespaces table empty.

Append file path column
If checked, the node will append a column of type Path with the provided name to the output table. For each row, this column contains the path of the file it was read from. The node will fail if adding the column with the provided name causes a name collision with any of the columns in the read table.

Limit Rows

Skip first data rows
If enabled, the specified number of valid data rows are skipped.
Limit data rows
If enabled, only the specified number of data rows are read.

Input Ports

Icon
The file system connection.

Output Ports

Icon
The complete XML document in a single data cell or the nodes matching the XPath query if XPath filtering is checked.

Popular Successors

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.