Phrase Indexer

The Phrase Indexer node creates a searchable index from a selected string column that contains multi-word phrases.
Before indexing, each phrase is split into individual tokens based on a user-defined delimiter (default: blank space). Each token is then indexed, enabling efficient approximate string matching over multi-word data.
This node is particularly useful for text fields containing full names, product descriptions, or address lines, where indexing based on words rather than entire strings improves recall and match flexibility.
The generated index can be passed to downstream node Approximate Phrase Index Matcher, enabling rapid retrieval of similar phrases or partial matches from large datasets.
Optionally, an Alias Object can expand the search with synonymous, canonical, or normalized forms (e.g., “NYC” ⇄ “New York City”), improving recall without sacrificing determinism.

Options

Select Column to Index

Selects the string column from the input table that contains the phrases to be indexed. Only string-convertible columns are available.

Delimiter

Specifies the character or pattern used to split each phrase into words before indexing.
By default, phrases are split by a blank space (" ").
Example delimiters: comma (,), semicolon (;), or custom token separators.
Proper delimiter selection helps ensure correct tokenization and optimal match performance.

Representation of Indexed Strings

Determines how indexed values are represented in the output table of downstream matching nodes.
Options:

Original - Displays strings as they appear in the input column.
Normalized - Displays transformed versions to improve fuzzy-matching precision.

Aliases

Select the alias set you want to apply to the index.
An Alias Object (created by the Alias Creator node) enables deterministic synonym expansion or canonicalization during indexing.
When an alias set is applied:

Terms may be expanded or rewritten according to alias rules
Penalty values influence downstream similarity scoring
Deterministic synonym handling improves recall while preserving explainability

If no alias set is selected, indexing proceeds without synonym expansion.

Input Ports

: Table containing the text or phrase column to be indexed.
…: A mapping object defined by Character Mapper node.
…: Optional synonym/canonicalization mappings used to expand or rewrite queries before matching.

Output Ports

: Contains the indexed representation of the tokenized phrases. The object includes metadata about the index (e.g., number of phrases, tokens, unique terms, algorithm parameters) and serves as input for downstream nodes.

Popular Predecessors

No recommendations found

Popular Successors

No recommendations found

Views

This node has no views

Workflows

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension exorbyte matchmaker toolbox from the below update site following our NodePit Product and Node Installation Guide:

v5.11

A zipped version of the software site can be downloaded here.

Plugin provider: exorbyte GmbH

Plugin version: 1.2.2

On NodePit since: 2026-03-10

Last update: 2026-03-11

Tags: Modern UI

KNIME versions: v5.11, v5.9, v5.8

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!