This node is currently not available in KNIME v5.9 — instead we’re showing this page for KNIME v5.5. You can use the version menu in the title bar to permanently switch your preferred version. This will also show the link to the update site.

Approximate Index Matcher (Labs)

The Approximate Index Matcher node queries a prebuilt index object, such as one generated by the Single-Field Indexer, to find values that are similar to a given set of query terms.
The node accepts two inputs:

An Index Object containing the indexed reference values.
A Comparison Table with the query strings to be matched.

For each query value, the node searches the index using configurable string similarity algorithms (e.g., Levenshtein, Positional, or Longest Common Subsequence) and returns either the best match, the top-k matches (not supported yet), or all matches above a given threshold (not supported yet).
The output can be returned as a standalone results table or by appending match-related columns to the original comparison table. Match information may include the best matching index value, match quality score, and optionally the edit script or full list of top-k candidates.
Using the index structure, this node provides efficient, scalable approximate matching even for large datasets. Typical use cases include entity resolution, deduplication, and data cleaning, where inconsistent or noisy text values need to be reconciled with a trusted reference set.

Options

Select settings group

Select Columns in Comparison Input

Select columns applicable to comparison to the Reference Terms

Row Filter Condition

Controls which rows are included in the output.
Options:

Output matching rows - Only rows that meet similarity criteria are included in the output. Matching is defined as having a similarity equal or higher than the similarity defined by the Match Quality Threshold.
A row where at least one column matches is considered a match for the whole row resembling an “OR” behavior.
Output non-matching rows - Only rows that do not meet the matching criteria are forwarded into the output. Any column that matches filters away such a row.
No Filtering - All rows are forwarded into the output. This is used to just add the matching information from the output section and use that information later.

Match Quality Threshold

This setting allows you to set the filter criteria based on Match Quality.
This setting only appears if filtering is actually switched on by the previous setting.

Add Match Quality Column

Depending on search settings

Add Match Sequence Column

Add Best Match Column

Display the match word that fits the input best. Only makes sense if multiple Match Words are given.

Matching Algorithm Selector

Select the algorithm used to calculate string similarity between reference and comparison inputs.
Options:

Levenshtein - Calculation of the Edit Distance, which is the minimum number of edit operations needed to transform the comparison term into the reference term. Allowed edit operations are:
- Insertion of a character
- Deletion of a character
- Substitution of a character
- Transposition of two adjacent characters (also referred to as Damerau-Levenshtein extension)
The Levenshtein Algorithm compares, by default, the whole strings from beginning to end. In special situations, one wants to ignore prefixes or suffixes and find the match at any position within the other term.
For this, there are options that trigger certain parts of the comparison term or the reference term to be ignored at no “cost”, meaning not counting as errors. The length of the ignored portion is optimized as to produce the best possible match.
You can choose 2 of the 4 options simultaneously. 3 or more are not meaningful, since one term could be completely ignored, not producing a relevant match quality.
Options are the following (described in more detail in their own section):
- Ignore Leading Characters in Reference Term
- Ignore Leading Characters in Comparison Term
- Ignore Trailing Characters in Reference Term
- Ignore Trailing Characters in Comparison Term
Positional Matching - Simply compares characters at fixed character positions (first with first, second with second…). This is needed for comparing e.g. IDs with a fixed format.
Longest Common Subsequence - Finds the longest subsequence of characters that appear in each of both terms. The sequence itself is not necessarily unique. Only the length of the longest sequence is important.

Ignore Leading Characters in Reference Term

Skips leading characters in the reference string during matching until the comparison term starts matching.

Ignore Leading Characters in Comparison Term

Skips leading characters in the comparison string during matching until the reference term starts matching.

Ignore Trailing Characters in Reference Term

Skips trailing characters in the reference string during matching.

Ignore Trailing Characters in Comparison Term

Skips trailing characters in the comparison string during matching.

Input Ports

: Provides the prebuilt index of reference values that will be queried.
: Table containing the values to be compared with the indexed reference.

Output Ports

: Contains the matching results. Depending on configuration, this may be the original comparison table with additional match-related columns (e.g., match sequence, match number, top-k matches).

Popular Predecessors

No recommendations found

Popular Successors

No recommendations found

Views

This node has no views

Workflows

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension exorbyte matchmaker toolbox from the below update site following our NodePit Product and Node Installation Guide:

v5.5

A zipped version of the software site can be downloaded here.

Plugin provider: exorbyte GmbH

Plugin version: 1.1.3

On NodePit since: 2025-10-27

Last update: 2025-12-24

Tags: StreamableModern UI

KNIME versions: From v5.5 to v5.5

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!