Phrase Indexer

The Phrase Indexer node creates a searchable index from a selected string column that contains multi-word phrases.
Before indexing, each phrase is split into individual tokens based on a user-defined delimiter (default: blank space). Each token is then indexed, enabling efficient approximate string matching over multi-word data.
This node is particularly useful for text fields containing full names, product descriptions, or address lines, where indexing based on words rather than entire strings improves recall and match flexibility.
The generated index can be passed to downstream node Approximate Phrase Index Matcher, enabling rapid retrieval of similar phrases or partial matches from large datasets.
Optionally, an Alias Object can expand the search with synonymous, canonical, or normalized forms (e.g., “NYC” ⇄ “New York City”), improving recall without sacrificing determinism.

Options

Select Column to Index
Selects the string column from the input table that contains the phrases to be indexed. Only string-convertible columns are available.
Delimiter
Specifies the character or pattern used to split each phrase into words before indexing.
By default, phrases are split by a blank space (" ").
Example delimiters: comma (,), semicolon (;), or custom token separators.
Proper delimiter selection helps ensure correct tokenization and optimal match performance.
Representation of Indexed Strings
Determines how indexed values are represented in the output table of downstream matching nodes.
Options:
  • Original - Displays strings as they appear in the input column.
  • Normalized - Displays transformed versions to improve fuzzy-matching precision.
Aliases
Select the alias set you want to apply to the index.
An Alias Object (created by the Alias Creator node) enables deterministic synonym expansion or canonicalization during indexing.
When an alias set is applied:
  • Terms may be expanded or rewritten according to alias rules
  • Penalty values influence downstream similarity scoring
  • Deterministic synonym handling improves recall while preserving explainability
If no alias set is selected, indexing proceeds without synonym expansion.

Input Ports

Icon
Table containing the text or phrase column to be indexed.
Icon
A mapping object defined by Character Mapper node.
Icon
Optional synonym/canonicalization mappings used to expand or rewrite queries before matching.

Output Ports

Icon
Contains the indexed representation of the tokenized phrases. The object includes metadata about the index (e.g., number of phrases, tokens, unique terms, algorithm parameters) and serves as input for downstream nodes.

Popular Predecessors

  • No recommendations found

Popular Successors

  • No recommendations found

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.