StringSimilarity

fastpath function to compare 2 strings.

Options

Accumulate
Specifies the names of input table columns to be copied to the output table
Casesensitive (booleans seperated by new line)
Specifies whether string comparison is case-sensitive. The default value is 'false'. You can specify either one value for all pairs or one value for each pair. If you specify one value for each pair, then the ith value applies to the ith pair.
ComparisonColumnPairs (strings seperated by new line)
Specifies pairs of input table columns that contain strings to be compared (column1 and column2), how to compare them (comparison_type), and (optionally) a constant and the name of the output column for their similarity (output_column). The similarity is a value in the range [0, 1]. For comparison_type, use one of these values: 'jaro': Jaro distance, 'jaro_winkler': Jaro-Winkler distance (1 for an exact match, 0 otherwise). If you specify this comparison type, you can specify the value of factor p with constant. 0 ≤ p ≤ 0.25. Default: p = 0.1, 'n_gram': N-gram similarity. If you specify this comparison type, you can specify the value of N with constant. 'LD': Levenshtein distance (the number of edits needed to transform one string into the other, where edits include insertions, deletions, or substitutions of individual characters), 'LDWS': Levenshtein distance without substitution. Number of edits needed to transform one string into the other using only insertions or deletions of individual characters. 'OSA': Optimal string alignment distance. Number of edits needed to transform one string into the other. Edits are insertions, deletions, substitutions, or transpositions of characters. A substring can be edited only once. 'DL': Damerau-Levenshtein distance. Like OSA, except that a substring can be edited any number of times. 'hamming': Hamming distance. For strings of equal length, number of positions where corresponding characters differ (that is, minimum number of substitutions needed to transform one string into the other). For strings of unequal length, -1. 'LCS': Longest common substring. Length of longest substring common to both strings. 'jaccard': Jaccard indexed-based comparison. 'cosine': Cosine similarity. 'soundexcode': Only for English strings: -1 if either string has a non-English character; otherwise, 1 if their soundex codes are the same and 0 otherwise. You can specify a different comparison_type for every pair of columns. The default output_column is 'sim_i', where i is the sequence number of the column pair.
Output Schema
Output Schema, if Volatile is true then use user login as the schema.
Output Table
Output Table
VAL Location
VAL Location
Volatile
Specifies whether the table should be a VOLATILE table. If true, then the table is automatically deleted, otherwise it is users responsibility to remove or clean it up for space.

Input Ports

Icon
Connection to a Teradata Database Instance
Icon
The relation that contains the string pairs to be compared.

Output Ports

Icon
output of StringSimilarity

Nodes

Extensions

Links