Icon

Character Mapper Overview Example Workflow

<p><strong>Character Mapper Overview</strong></p><p>This workflow demonstrates how to prepare, normalize, and match text data using the <strong>Exorbyte MatchMaker Extension</strong> for KNIME.<br>It provides a complete walkthrough — from <strong>licensing setup</strong> to <strong>data normalization and fuzzy matching</strong> — showcasing how the <strong>Character Mapper (Labs)</strong> node enhances data consistency and match accuracy across multiple scenarios.</p><p>🔍 <strong>Overview</strong></p><p>Real-world data often contains <strong>accents, umlauts, mixed casing, punctuation, or non-Latin scripts</strong>, making record linkage and search unreliable.<br>The <strong>Character Mapper (Labs)</strong> node standardizes these text fields by applying configurable normalization rules.<br>The workflow shows how to apply this normalization to different types of data and how it integrates with downstream <strong>Approximate Matching</strong> and <strong>Indexing</strong> nodes.</p><p>🧱 <strong>Sections in this Workflow</strong></p><ol><li><p><strong>How to Get Your License</strong><br>Request, register, and activate your Exorbyte MatchMaker license using the <strong>License Requester</strong> and <strong>License Activator</strong> nodes.</p></li><li><p><strong>Normalize Customer Names</strong><br>Unify spelling variants caused by accents or umlauts to improve matching in customer databases.</p></li><li><p><strong>Normalize Addresses</strong><br>Clean whitespace, newlines, and accents for consistent formatting and easier grouping.</p></li><li><p><strong>Normalize Product Names &amp; Currencies</strong><br>Convert symbols and currency signs into standardized formats to align product data across systems.</p></li><li><p><strong>Normalize Supplier Names (Any-to-Latin Transliteration)</strong><br>Transliterate non-Latin scripts (e.g., Cyrillic) and remove diacritics for multilingual data normalization.</p></li><li><p><strong>Character Mapper + Approximate String Matcher</strong><br>Combine normalization with fuzzy matching for accent- and spelling-tolerant comparisons between names.<br></p><p><strong>Contact us</strong><br> Website: https://www.exorbyte.com <br> Email: consulting@exorbyte.com</p></li></ol>

URL: exorbyte GmbH https://www.exorbyte.com/en

🔐 How to Get Your License

Use this node to request and register your exorbyte matchmaker license before running any toolbox nodes.

  1. Choose Demo (30 days) or Production.

  2. Enter your email (and Customer Token if production).

  3. Execute the node — it sends a secure request to exorbyte team.

  4. If offline, manually email the request file toknime-node-license@exorbyte.com.

  5. When you receive the .lic file, reopen the node → Use available license fileand run the node → run License Activator.

⚠️ Each KNIME installation or Hub environment needs its own license.

👉 See full workflow guide: How to license exorbyte Extension

✨ Normalize Customer Names

Customer names often appear with accents, umlauts, or inconsistent casing, which can make matching unreliable.
The Character Mapper (Labs) node helps you standardize them into a unified, comparable form.

🧩 Example transformations:

  • MÁRÍA GARCÍA → MARIA GARCIA

  • Jörg Müller → JOERG MUELLER

  • François Dupont → FRANCOIS DUPONT

⚙️ Recommended settings:

  • Mapped Characters → "a-zA-Z0-9 "

  • Map Upper → ON

  • Deaccentuate → ON

  • Expand Umlauts → ON

  • Map Spaces → ON

  • Any to Latin → ON

A new column with clean, normalized customer names—ideal for consistent matching, indexing, and deduplication across systems

🏠 Normalize Addresses

Addresses often contain extra spaces, tabs, newlines, or accents, making them hard to compare or group correctly.
The Character Mapper (Labs) node standardizes these address strings to ensure clean, searchable formats.

📄 Example transformations:

  • "Hauptstr.\t12\n 2.OG" → HAUPTSTR.122.OG

  • "Rue de l'Église 9" → RUEDELEGLISE9

  • "Av. de la Constitución 5" → AV.DELACONSTITUCION5

⚙️ Recommended settings:

  • Mapped Characters → "a-zA-Z0-9 "

  • Map Upper → ON

  • Deaccentuate → ON

  • Expand Umlauts → ON

  • Map Spaces → ON

Addresses become clean and uniformly formatted, allowing better grouping, deduplication, and geocoding in downstream workflows.

💰 Normalize Product Names & Currencies

Product data often mixes symbols, punctuation, and currency formats that make it difficult to compare or search product names consistently. The Character Mapper node can harmonize these differences by converting symbols and special characters into standardized representations.

📦 Example transformations:

  • "Preis €12,99 – Starter" → PREIS E1299 STARTER

  • "Price $13.50 Starter" → PRICE S1350 STARTER

  • "Café Creme 200g" → CAFE CREME 200G

⚙️ Recommended settings:

  • Map Currency → ON

  • Deaccentuate → ON

  • Map Upper → ON

  • Map Spaces → ON

  • Mapped Characters → a-zA-Z0-9 -/

🌐 Normalize Supplier Names (Any-to-Latin Transliteration)

Use case:
Supplier names may include non-Latin scripts (e.g. Cyrillic) and diacritics/umlauts. The Character Mapper (Labs) converts them into a consistent Latin representation for reliable matching, sorting, and indexing.

Dataset: suppliers.csv
Examples:

  • Müller & Söhne GmbH → MUELLER & SOEHNE GMBH

  • Łódź Export Sp. z o.o. → LODZ EXPORT SP Z OO

Recommended settings:

  • Any to Latin: ON

  • Deaccentuate: ON

  • Expand Umlauts: ON

  • Map Upper: ON

  • Map Spaces: ON

  • Mapped Characters: a-zA-Z0-9 &.,-

  • Append or Replace: Append with suffix _mapped

🤖 Character Mapper + Approximate String Matcher

Use this node to create a character mapping object, then let Term Matcher apply that map to all comparisons it performs.

⚙️ Steps:

  1. Create the map: Add Character Mapper (Labs) and configure rules (deaccentuate, umlauts, case, etc.).

  2. Wire the map: Connect the Character Mapping port from Character Mapper to Approximate String Matcher (Labs).

  3. Provide data: Feed the First Names table and the Search Names table directly into the matcher.

  4. Execute to find accent- and spelling-tolerant matches (e.g., Joerg → Jörg, Maria → MÁRÍA).

License Requester
License Activator
Customer Names
CSV Reader
Character Mapper
Addresses
CSV Reader
Character Mapper
Product Names
CSV Reader
Character Mapper
Character Mapper
Suppliers
CSV Reader
First Names
Table Creator
Search Names
Table Creator
Term Matcher
Character Mapper

Nodes

Extensions

Links