Similarity Search

This node takes each row in the query table (Port 0) and searches the reference table (Port 1) for a number of rows matching the specified similarity/distance criteria. If multiple results are requested, the query result row is duplicated for each subsequent match.

Options

Distance function: Choose which method is used to calculate the distance (or similarity) for the query. The panel does only appear if there is no distance measure connected (Port 2).

Euclidean Distance
Requires 1 or more numeric columns.

Manhattan Distance
Requires 1 or more numeric columns.

Distance Vector
Requires one column of the Distance Vector type as is generated by the Distance Matrix Calculate node.

Tanimoto Similarity
Requires a bit-vector fingerprint.

Tanimoto Similarity (old)
A deprecated version of Tanimoto similarity. Requires a deprecated bit vector type.

Cosine Similarity
Requires 1 or more numeric values.

Cosine Bitvector Similarity
Requires a bit-vector fingerprint.

Dice's Coefficient
Requires a bit-vector fingerprint.

Levenshtein (absolute)
Requires one string column. It's the absolute number of edits between strings (can be > 1).

Levenshtein (normalized)
Requires one string column. It's the absolute number of edits between strings normalized over the length of the longer string (at most 1).
Column Selection: Choose which columns to use in the calculation. Unusable columns will be ignored. The panel does only appear if there is no distance measure connected (Port 2).
Coefficient Type: Determines how the output is represented. It does not have an effect on the calculation. Note, this is only meaningful with the Tanimoto Similarity metric. Distance More different rows have a smaller index.
Similarity More similar rows have a smaller index.
Neighbor Selection: Choose whether more similar or more distant results match the query.
Range Filtering: Specify a similarity/distance range query for query hits. For example, a search using Tanimoto Similarity with a range filter of 0 to 0.9999999 would return the nearest non-identical matches to the query row.
Output column name prefix: This string will be used in the construction of output column names.
Representative Column: The column used to identify the entries in the lower table that match the query criteria.
RowID Suffix Separator: When multiple search results are requested, this delimiter separates the original Row ID from the index of the result. For example, if the Row ID is RowN, the delimiter is set to "_" and the node is configured to find 3 neighbors, then the resulting Row IDs would be "RowN_1", "RowN_2", and "RowN_3".

Input Ports

: Each row is used as a query for similar (or non-similar) entries in the reference table.
: Data set to in which to search for nearest/farthest neighbors.
: Optional distance measure, which replaces the distance configuration.

Output Ports

: The input data set with three additional columns for (i) neighbor index (ii) neighbor (the row id or some other representative column) and (iii) the distance/similarity value. The 2nd, 3rd, ... next neighbors are represented by additional rows.

Popular Predecessors

Popular Successors

Views

This node has no views

Workflows

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension KNIME Distance Matrix from the below update site following our NodePit Product and Node Installation Guide:

v5.5

A zipped version of the software site can be downloaded here.

Plugin provider: KNIME AG, Zurich, Switzerland

Plugin version: 5.5.0.v202412191418

On NodePit since: 2025-07-02

Last update: 2025-08-12

KNIME versions: Since v3.6

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!