Similarity Search

This node takes each row in the query table (Port 0) and searches the reference table (Port 1) for a number of rows matching the specified similarity/distance criteria. If multiple results are requested, the query result row is duplicated for each subsequent match.

Options

Distance function
Choose which method is used to calculate the distance (or similarity) for the query. The panel does only appear if there is no distance measure connected (Port 2).

Euclidean Distance
Requires 1 or more numeric columns.

Manhattan Distance
Requires 1 or more numeric columns.

Distance Vector
Requires one column of the Distance Vector type as is generated by the Distance Matrix Calculate node.

Tanimoto Similarity
Requires a bit-vector fingerprint.

Tanimoto Similarity (old)
A deprecated version of Tanimoto similarity. Requires a deprecated bit vector type.

Cosine Similarity
Requires 1 or more numeric values.

Cosine Bitvector Similarity
Requires a bit-vector fingerprint.

Dice's Coefficient
Requires a bit-vector fingerprint.

Levenshtein (absolute)
Requires one string column. It's the absolute number of edits between strings (can be > 1).

Levenshtein (normalized)
Requires one string column. It's the absolute number of edits between strings normalized over the length of the longer string (at most 1).
Column Selection
Choose which columns to use in the calculation. Unusable columns will be ignored. The panel does only appear if there is no distance measure connected (Port 2).
Coefficient Type
Determines how the output is represented. It does not have an effect on the calculation. Note, this is only meaningful with the Tanimoto Similarity metric. Distance More different rows have a smaller index.
Similarity More similar rows have a smaller index.
Neighbor Selection
Choose whether more similar or more distant results match the query.
Range Filtering
Specify a similarity/distance range query for query hits. For example, a search using Tanimoto Similarity with a range filter of 0 to 0.9999999 would return the nearest non-identical matches to the query row.
Output column name prefix
This string will be used in the construction of output column names.
Representative Column
The column used to identify the entries in the lower table that match the query criteria.
RowID Suffix Separator
When multiple search results are requested, this delimiter separates the original Row ID from the index of the result. For example, if the Row ID is RowN, the delimiter is set to "_" and the node is configured to find 3 neighbors, then the resulting Row IDs would be "RowN_1", "RowN_2", and "RowN_3".

Input Ports

Icon
Each row is used as a query for similar (or non-similar) entries in the reference table.
Icon
Data set to in which to search for nearest/farthest neighbors.
Icon
Optional distance measure, which replaces the distance configuration.

Output Ports

Icon
The input data set with three additional columns for (i) neighbor index (ii) neighbor (the row id or some other representative column) and (iii) the distance/similarity value. The 2nd, 3rd, ... next neighbors are represented by additional rows.

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.