Explainable Fuzzy Matching - Typo Error Statistics

Explainable Fuzzy Matching on Payee DataThis workflow demonstrates how to use Approximate String Matching to reconcile noisy, user-entered payee names with a clean reference list of canonical entities. Beyond generating similarity scores, the workflow provides explainable error statistics to highlight where and how mismatches occur.🔹 Steps in the Workflow<ol><li>📂 Load Data<ul><li>Reference Data: clean list of canonical payee names.</li><li>Payee Data with Typos: noisy, real-world names entered by users.</li></ul></li><li>🔍 Approximate String Matching (Levenshtein)<ul><li>Matches each entered payee name against the reference list.</li><li>Produces a Match Sequence (e.g., oooo=ooo=ox=+) that explains differences character by character:<ul><li>o → match</li><li>= → substitution (wrong character)</li><li>+ → insertion (extra character)</li><li>x → deletion (missing character)</li></ul></li></ul></li><li>🧮 Error Type Analysis<ul><li>Counts substitutions, insertions, deletions, and matches.</li><li>Calculates error ratios, edit distance, and match accuracy.</li><li>Provides explainable quality metrics for each match.</li></ul></li><li>📊 Aggregation & Statistics<ul><li>Groups results by reference payee.</li><li>Computes the average error profile per entity (e.g., “Deutsche Bank AG entries often miss characters”).</li><li>Rounds and formats values for readability.</li></ul></li><li>📈 Interactive Dashboard<ul><li>Table of canonical payees with their average match accuracy.</li><li>Bar chart showing the distribution of error types (substitution, insertion, deletion).</li><li>Clear insights into where manual review may be needed and which vendors/customers are most error-prone.</li></ul></li></ol>🔹 Business Value<ul><li>Data Quality Monitoring → Understand how user-entered names deviate from reference data.</li><li>Explainable Matching → Not just similarity scores, but insights into why mismatches occur.</li><li>Operational Efficiency → Identify entities requiring frequent manual corrections.</li><li>Compliance Support → Improve accuracy for KYC, AML, and financial reconciliation tasks.</li></ul>

URL: exorbyte GmbH https://www.exorbyte.com/en

Nodes

Extensions

Download

To use this workflow in KNIME, download it from the below URL and open it in KNIME:

Download Workflow

Created by: Ahmad.Varasteh

Created at: 2025-09-03

On NodePit since: 2025-09-04

Last update: 2026-03-11

Created with KNIME version: v5.8.2

Tags: levenshteinfuzzy-matchingstring-matchingdata-qualityentity-resolutionfinance

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!

📂 Load Data

🔍 Approximate String Matching

🧮 Calculate Error Types

📊 Aggregate by Reference

📈 Error Statistics Dashboard