0 ×

Similarity Search

KNIME Distance Matrix Extension version 3.7.0.v201809280949 by KNIME AG, Zurich, Switzerland

This node takes each row in the query table (Port 0) and searches the reference table (Port 1) for a number of rows matching the specified similarity/distance criteria. If multiple results are requested, the query result row is duplicated for each subsequent match.

Options

Distance function
Choose which method is used to calculate the distance (or similarity) for the query. The panel does only appear if there is no distance measure connected (Port 2).

Euclidean Distance
Requires 1 or more numeric columns.

Manhattan Distance
Requires 1 or more numeric columns.

Distance Vector
Requires one column of the Distance Vector type as is generated by the Distance Matrix Calculate node.

Tanimoto Similarity
Requires a bit-vector fingerprint.

Tanimoto Similarity (old)
A deprecated version of Tanimoto similarity. Requires a deprecated bit vector type.

Cosine Similarity
Requires 1 or more numeric values.

Cosine Bitvector Similarity
Requires a bit-vector fingerprint.

Dice's Coefficient
Requires a bit-vector fingerprint.

Levenshtein (absolute)
Requires one string column. It's the absolute number of edits between strings (can be > 1).

Levenshtein (normalized)
Requires one string column. It's the absolute number of edits between strings normalized over the length of the longer string (at most 1).
Column Selection
Choose which columns to use in the calculation. Unusable columns will be ignored. The panel does only appear if there is no distance measure connected (Port 2).
Coefficient Type
Determines how the output is represented. It does not have an effect on the calculation. Note, this is only meaningful with the Tanimoto Similarity metric. Distance More different rows have a smaller index.
Similarity More similar rows have a smaller index.
Neighbor Selection
Choose whether more similar or more distant results match the query.
Range Filtering
Specify a similarity/distance range query for query hits. For example, a search using Tanimoto Similarity with a range filter of 0 to 0.9999999 would return the nearest non-identical matches to the query row.
Output column name prefix
This string will be used in the construction of output column names.
Representative Column
The column used to identify the entries in the lower table that match the query criteria.
RowID Suffix Separator
When multiple search results are requested, this delimiter separates the original Row ID from the index of the result. For example, if the Row ID is RowN, the delimiter is set to "_" and the node is configured to find 3 neighbors, then the resulting Row IDs would be "RowN_1", "RowN_2", and "RowN_3".

Input Ports

Each row is used as a query for similar (or non-similar) entries in the reference table.
Data set to in which to search for nearest/farthest neighbors.
Optional distance measure, which replaces the distance configuration.

Output Ports

The input data set with three additional columns for (i) neighbor index (ii) neighbor (the row id or some other representative column) and (iii) the distance/similarity value. The 2nd, 3rd, ... next neighbors are represented by additional rows.

Best Friends (Incoming)

Best Friends (Outgoing)

Workflows

Update Site

To use this node in KNIME, install KNIME Distance Matrix Extension from the following update site:

Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform.