Approximate Phrase Matcher (Labs)

The Approximate Phrase Matcher (Labs) node performs approximate phrase-level similarity matching between two text inputs — a Reference Table and a Comparison Table.
It calculates keyword-based similarity between each comparison phrase and the reference phrases based on subword alignment, detecting overlaps, inclusions, and extended forms. The node supports three configurable algorithms that define the logical direction and context of matching:

  • Subword – Detects partial overlaps or local matches between phrases by analyzing shared character subwords.
  • Subset – Determines whether the comparison phrase is contained within the reference phrase.
  • Superset – Determines whether the comparison phrase contains or extends the reference phrase.
This node is ideal for fuzzy matching of sentences, reviews, product names, and entity phrases, allowing downstream filtering, labeling, and duplicate detection tasks.
This node uses exorbyte’s deterministic subword-matching engine to compute fuzzy similarity between multi-word strings.
It performs normalization, subword extraction, and local alignment similar to Levenshtein distance but optimized for phrase structures rather than isolated strings.
By combining subword decomposition with directional matching logic (Subset/Superset), it enables context-aware comparisons useful for entity normalization, phrase clustering, and sentiment pattern detection.
This node may only be used for private and non-commercial purposes. Commercial use requires a valid license from exorbyte GmbH. All rights reserved.
For more information contact consulting@exorbyte.com.

Options

Select Settings Group
Allowing the user to navigate through different sections of the configuration options
  • Input
  • Search
  • Output
Select Column in Reference Input
Select a column of the Reference Input Table to be used as list of Reference Terms
Select Columns in Comparison Input
Select columns applicable to comparison to the Reference Terms
Add Column with Numeric Matching Value
Appends a column showing the calculated similarity score (character count or percentage).
Add Column with Character Match Sequence
Adds a symbolic alignment string visualizing matching and mismatching characters.
'=' -> Match
'x' -> Mismatch
'+' -> Insertion
'/' or '\' -> transition
Add Column with Hit Characters Sequence
Appends a column showing which parts of the reference phrase were matched by the comparison.
Add Column with Best Reference Match
Appends the most similar reference phrase for each comparison row, identifying the best match candidate.
Matching Algorithm Selector
Specifies the algorithm used to calculate phrase similarity between reference and comparison inputs.
Each algorithm defines a specific containment relationship and subword-level comparison strategy.
Options:
  • Subword - Detects overlapping character subsequences (subwords) between phrases. Suitable for partial or flexible overlap detection.
  • Subset - Checks if the comparison phrase is contained in the reference phrase.
  • Superset - Checks if the comparison phrase contains the reference phrase.
Case Sensitivity
Determines whether the matching process should treat uppercase and lowercase characters as distinct.
Options:
  • Case Sensitive - Maintains exact letter casing during comparison.
  • Case Insensitive - Normalizes all text to lowercase before matching.
Numeric Matching Value
Defines how similarity between phrases is measured numerically.
Options:
  • Number of Matching Characters - Returns the total number of identical characters between the comparison and reference phrase.
  • Similarity in Percent - Returns a normalized percentage value (0–100%) representing relative similarity.
Row Filter Condition
Controls which rows are included in the node output based on the match result.
Options:
  • Output matching rows - Only outputs rows that meet or exceed the similarity threshold.
  • Output non-matching rows - Only outputs rows that do not meet the threshold.
  • No Filtering - Outputs all rows with match metadata for analysis.
Matching Value Threshold - Minimal Number of Matches
This setting allows you to set the filter criteria based on the selection of the Numeric Matching Value.
This setting only appears, if filtering is actually switched on by the previous setting.
If the algorithm specific matching value was chosen, it applies to this number. If similarity was chosen, the value here is also a similarity threshold.
Matching Value Threshold - Minimal Matching Percentage
This setting allows you to set the filter criteria based on the selection of the Numeric Matching Value.
This setting only appears, if filtering is actually switched on by the previous setting.
If the algorithm specific matching value was chosen, it applies to this number. If similarity was chosen, the value here is also a similarity threshold.

Input Ports

Icon
Mapping
Icon
Contains canonical or reference phrases to match against.
Icon
Contains phrases to be compared with the reference input.

Output Ports

Icon
Comparison rows enriched with numeric match values, alignment sequences, and the best matching reference phrase.

Popular Predecessors

  • No recommendations found

Popular Successors

  • No recommendations found

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.