Character Mapper (Labs)

This node normalizes text by transforming characters according to a configurable mapping. Users can restrict the allowed alphabet and apply rules such as case folding, accent removal, umlaut expansion, whitespace unification, currency symbol mapping, and transliteration of non-Latin characters.
The node produces two outputs:

  • A mapped string column, where each input string is transformed into its normalized form.
  • A TeraMap object, which captures the applied mapping rules and can be passed into the Approximate String Matcher node to ensure that Levenshtein-based comparisons are consistent with the normalization applied here.
This preprocessing step improves recall and precision in approximate matching by collapsing spelling variants, diacritics, case differences, and spacing irregularities into a common canonical form.

Options

Select Column to Map
Chooses one string column from the input table to normalize. Only string-convertible columns are available.
Mapped Characters
Defines the allowed/target character set (e.g., a-zA-Z0-9). Characters outside this set are either replaced according to the selected rules or removed.
Map Upper
Converts all lowercase characters to uppercase for consistent matching.
Deaccentuate
Removes diacritics from accented characters (e.g., é → e, ç → c).
Expand Umlauts
Converts German umlauts into digraphs (e.g., ä → ae, ö → oe, ü → ue, ß → ss).
Map Currency
Normalizes currency symbols into codes (e.g., € → EUR, $ → USD).
Map Spaces
Collapses multiple spaces and standardizes spacing.
Any to Latin
Transliterates non-Latin characters into Latin equivalents (e.g., Спутник → Sputnik).
Append or Replace Column
Lets the user choose whether to keep the original column and add a normalized version, or overwrite the original column directly.
Output Column Suffix
Suffix to append when creating a new mapped column. Useful when normalizing multiple columns.

Input Ports

Icon
Table containing the string column(s) to normalize.

Output Ports

Icon
Table with an additional or replaced column containing the mapped/normalized strings.
Icon
Encodes the character mapping rules and normalization configuration for use in downstream nodes such as the Approximate String Matcher.

Popular Predecessors

  • No recommendations found

Popular Successors

  • No recommendations found

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.