The Approximate Phrase Matcher (Labs) node performs approximate phrase-level
similarity matching between two text inputs — a Reference Table and a Comparison Table.
It calculates keyword-based similarity between each comparison phrase and the reference
phrases based on subword alignment, detecting overlaps, inclusions, and extended forms.
The node supports three configurable algorithms that define the logical direction and
context of matching:
Subword – Detects partial overlaps or local matches between phrases by
analyzing shared character subwords.
Subset – Determines whether the comparison phrase is contained within the
reference phrase.
Superset – Determines whether the comparison phrase contains or extends the
reference phrase.
This node is ideal for fuzzy matching of sentences, reviews, product names, and entity
phrases, allowing downstream filtering, labeling, and duplicate detection tasks.
This node uses exorbyte’s deterministic subword-matching engine to compute fuzzy similarity
between multi-word strings.
It performs normalization, subword extraction, and local alignment similar to Levenshtein
distance but optimized for phrase structures rather than isolated strings.
By combining subword decomposition with directional matching logic
(Subset/Superset), it enables context-aware comparisons useful for entity normalization,
phrase clustering, and sentiment pattern detection.
This node may only be used for private and non-commercial purposes. Commercial use
requires a valid license from exorbyte GmbH. All rights reserved.
For more information contact
consulting@exorbyte.com.
Options
Select Settings Group
Allowing the user to navigate through different sections of the configuration options
Input
Search
Output
Select Column in Reference Input
Select a column of the Reference Input Table to be used as list of Reference Terms
Select Columns in Comparison Input
Select columns applicable to comparison to the Reference Terms
Add Column with Numeric Matching Value
Appends a column showing the calculated similarity score (character count or percentage).
Add Column with Character Match Sequence
Adds a symbolic alignment string visualizing matching and mismatching characters.
'=' -> Match
'x' -> Mismatch
'+' -> Insertion
'/' or '\' -> transition
Add Column with Hit Characters Sequence
Appends a column showing which parts of the reference phrase were matched by
the comparison.
Add Column with Best Reference Match
Appends the most similar reference phrase for each comparison row, identifying
the best match candidate.
Matching Algorithm Selector
Specifies the algorithm used to calculate phrase similarity between reference and comparison inputs.
Each algorithm defines a specific containment relationship and subword-level comparison strategy.
Options:
Subword - Detects overlapping character subsequences (subwords) between phrases.
Suitable for partial or flexible overlap detection.
Subset - Checks if the comparison phrase is contained in the reference phrase.
Superset - Checks if the comparison phrase contains the reference phrase.
Case Sensitivity
Determines whether the matching process should treat uppercase and lowercase characters
as distinct.
Options:
Case Sensitive - Maintains exact letter casing during comparison.
Case Insensitive - Normalizes all text to lowercase before matching.
Numeric Matching Value
Defines how similarity between phrases is measured numerically. Options:
Number of Matching Characters - Returns the total number of identical characters
between the comparison and reference phrase.
Similarity in Percent - Returns a normalized percentage value (0–100%)
representing relative similarity.
Row Filter Condition
Controls which rows are included in the node output based on the match result.
Options:
Output matching rows - Only outputs rows that meet or exceed the similarity
threshold.
Output non-matching rows - Only outputs rows that do not meet the threshold.
No Filtering - Outputs all rows with match metadata for analysis.
Matching Value Threshold - Minimal Number of Matches
This setting allows you to set the filter criteria based on the selection of the Numeric Matching Value.
This setting only appears, if filtering is actually switched on by the previous setting.
If the algorithm specific matching value was chosen, it applies to this number. If similarity
was chosen, the value here is also a similarity threshold.
Matching Value Threshold - Minimal Matching Percentage
This setting allows you to set the filter criteria based on the selection of the Numeric Matching Value.
This setting only appears, if filtering is actually switched on by the previous setting.
If the algorithm specific matching value was chosen, it applies to this number. If similarity
was chosen, the value here is also a similarity threshold.
Input Ports
Mapping
Contains canonical or reference phrases to match against.
Contains phrases to be compared with the reference input.
Output Ports
Comparison rows enriched with numeric match values, alignment sequences,
and the best matching reference phrase.