0 ×

Regex Split

StreamableKNIME Base Nodes version 4.1.0.v201912041211 by KNIME AG, Zurich, Switzerland

This node splits the string content of a selected column into logical groups using regular expressions. A group is identified by a pair of parentheses, whereby the pattern in such parentheses is a regular expression. Each content of each group is appended as an individual column.

A short introduction to Groups and Capturing is given by in the Java API . Some examples are given below:

Parsing Patent Numbers

Patent identifiers such as "US5443036-X21" consisting of a (at most) two letter country code ("US"), a patent number ("5443036") and possibly some application code ("X21"), which is separated by a dash or a space character, can be grouped by the expression ([A-Za-z]{1,2})([0-9]*)[ \-]*(.*$). Each of the parenthesized terms corresponds to the aforementioned properties.

Strip File URLs

This is particularly useful when this node is used to parse the file URL of a file reader node (the URL is exposed as flow variable and then exported to a table using a Variable to Table node). The format of such URLs is similar to "file:c:\some\directory\foo.csv". Using the pattern [A-Za-z]*:(.*[/\\])(([^\.]*)\.(.*$)) generates four groups (by counting the number of opening parentheses): The first group identifies the directory and is denoted by "(.*[/\\])". It consumes all characters until a final slash or backslash is encountered; in the example this refers to "c:\some\directory\". The second group represents the file name, whereby it encapsulates the third and fourth group. The third group (denoted by "([^\.]*)") consumes all characters after the directory, which are not a dot '.' (which is "foo" in the above example). The pattern expects a single dot (which is ignored) and finally the fourth group "(.*$)", which reads until the end of the string and indicates the file suffix ('csv'). The groups for the above example are

  1. c:\some\directory
  2. foo.csv
  3. foo
  4. csv

Options

Ignore Case
Enables case-insensitive matching.
Multiline
Enables multiline mode, i.e. when selected the expression ^ and $ match the start and end of the input string. This option only matters if the input string have line breaks.

Input Ports

Input table with string column to be split.

Output Ports

Input table amended by additional column representing the pattern groups.

Best Friends (Incoming)

Best Friends (Outgoing)

Workflows

Installation

To use this node in KNIME, install KNIME Core from the following update site:

KNIME 4.1
Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform.

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.