String Manipulation (Multi Column)

The String Manipulation (Multi Column) node is an extension of the String Manipulation node, applying the provided expression to every selected column in turn. This can be used to search and replace, compare, or concatenate strings in multiple columns at once.

In each iteration, one column name from the selection will be used as replacement for the $$CURRENTCOLUMN$$ placeholder in the expression. For instance, when selecting columns first_name and last_name, the expression capitalize($$CURRENTCOLUMN$$) would first evaluate capitalize($first_name$) and then capitalize($last_name$).

Simple Examples

Search and Replace

replace($$CURRENTCOLUMN$$, "Old", "New") searches for "Old" and replaces with "New" in all selected columns.

Concatenate Strings

join($$CURRENTCOLUMN$$, $Unit$) appends the string in the Unit column to each string in the selected columns. When selecting the Freeze and Boil columns and appending the results as new columns, the following table could result:

Freeze Boil Unit Freeze with Unit Boil with Unit
0 100 ˚C 0˚C 100˚C
32 212 F 32F 212F
273.15 373.15 K 273.15K 373.15K

Advanced Examples

Use Columns or Flow Variables to Control String Manipulation

You can use the value of an integer column to control integer parameters of a string manipulator: substr($$CURRENTCOLUMN$$, $Start Position$) extracts the substring of $$CURRENTCOLUMN$$ that starts at offset k, where k is the integer value in the column Start Position in the current row. For example, when the selected column is Customer and the results are appended as Customer Name Only, the expression substr($$CURRENTCOLUMN$$, $Start Position$) could be used to generate the following ouput:

Customer Start Position Customer Name Only
Mr. Jones 3 Jones
Prof. Desmond 5 Desmond

Flow variables can also be used to control the parameter of a string manipulation function. For instance, if there is an integer flow variable start-position available, the above example could be rewritten as substr($$CURRENTCOLUMN$$, $${Istart-position}$$).

Convert Strings to Missing Values

toNull(replace($$CURRENTCOLUMN$$, "N.A.", "")) searches for "N.A." in all selected columns, replaces them with "" and returns a missing value, if the resulting string is empty (toNull converts empty strings to a missing value). For multiple strings to replace, a regular expression separating the target strings with | can be used: regexReplace($$CURRENTCOLUMN$$, "N.A.|-", ""). This replaces both occurrences of "N.A." and "-" with empty strings.

Options

String Manipulation

Column Filter
The columns that the node should loop over. These columns can be dynamically referenced in the expression using $$CURRENTCOLUMN$$. Dynamically referenced means that $$CURRENTCOLUMN$$ refers to different columns, depending on the columns that are selected in the column filter. It is allowed to include non-string columns from the column filter, but their values will be converted to string before substituting them for $$CURRENTCOLUMN$$.
Column List
Available columns in the input data table. They can be statically referenced in an expression, meaning that they always refer to the same column in the input table, no matter which columns are selected to loop over. For instance, the column age can be referenced as $age$. Values will not automatically be converted to string. If necessary, convert to string, e.g., string($age$).
Flow Variable List
Available flow variables, which can be used as variables in the expression. Like statically referenced columns, their values are not automatically converted to string. This makes it easier to use them to control non-string parameters of string manipulation functions, like in the substring example above.
Category
The string function categories.
Function
List of string functions in the selected category.
Description
Description and example for the selected function.
Expression
The expression editor.
Append Selected Columns with Suffix
If selected, the computed values columns will be appended as new columns, using the original column name with the specified suffix.
Replace Selected Columns
If selected, the computed values will replace the values in the columns that were selected to loop over.
Insert missing values as null
When any of the referenced columns (statically or dynamically, see above) contain missing values, the expression can either be evaluated with the missing values set to null or be skipped entirely, simply returning a missing value for the output.
Fail if expression can not be evaluated
Whether to stop node execution when an expression can not be evaluated. This could be the case if an input column contains values that the expression can not handle (e.g., toInt() on a column that contains "1", "2", "zebra", "4", "5" would fail on the third row). The default behavior is to stop execution, but disabling this option will output missing values instead.

Input Ports

Icon
Input data with multiple columns to proces.

Output Ports

Icon
Input data with replaced or appended columns.

Popular Predecessors

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.