Reference Fragments to MMPs

This node implements the Hussain and Rea algorithm for finding Matched Molecular Pairs in a dataset. The node takes two input tables of fragments generated MMP Molecule Fragment nodes and generates an output table of matched molecular pairs (MMPs)

In this implementation pairs are only created between rows of the query and reference tables (the 'forwards' direction is from the 'Left' query row to the 'Right' reference row). Both tables must have the same structure

The node requires two SMILES input columns, representing the 'key' (unchanging atoms) and 'value', and a string column containing the ID. The node will attempt to auto-guess these column selections based on the default names for the columns output by the fragment node.

The input table can contain fragmentations from differing numbers of cuts, in which case this will be reflected in the output table.

The table will be pre-sorted by key followed by value during execution, unless the 'Incoming table is sorted by Keys and Values?' option is selected. If this option is selected and correct sorting is not applied, then pairs may be missed (incorrect keys sorting) or non-canonical in their direction (incorrect values sorting)

Incoming columns can be passed through unchanged (Left, Right or both), numeric columns (Integer, Long, Double and Complex Number) can have differences (L - R or R - L) and ratios (Double only) calculated (L / R or R / L)

Transforms can be filtered based on the Value Attachment point graph distance calculated during fragmentation using a number of options

None - No filtering
Max total graph distance change - the sum of all graph distance changes
Max single graph distance change - the maximum tolerated change in any single distance
Tanimoto - the vector Tanimoto similarity
Dice - the vector Dice similarity
Cosine - the vector Cosine similarity
Euclidean - the vector Euclidean distance
Hamming - the vector Hamming (Manhattan or City-block) distance
Soergel - the vector Soergel distance

Filtering can also be performed based on the change in heavy atom count during the transformation

This node was developed by Vernalis Research . For feedback and more information, please contact knime@vernalis.com

1.J. Hussain and C Rea, " Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large datasets ", J. Chem. Inf. Model. , 2010, 50 , 339-348 (DOI: 10.1021/ci900450m ).

Options

Select the Fragment Key column: Select the column containing the fragment 'keys'
Select the Fragment Value column: Select the column containing the fragment 'values'
Incoming table is sorted by Keys and Values?: Use this option if the input table is pre-sorted by 'keys', then by 'values'. See above for details
Select the ID column: Select the column containing the parent molecule IDs
Allow self-transforms: Allows two regioisomeric fragmentations of an input molecule resulting in identical keys but differing values to provide a 'self-transform' between the fragmentations
Filter by HAC Change: Should the transform be filtered by delta HAC? NB This is asymmetric so the 'Show reverse-direction transforms' option will not show pairs in some cases, e.g. if the range is set from -2 to +4 then a transform losing 3 heavy atoms in the forwards direction will only show in the reverse direction
HAC Change Range: The range of acceptable HAC changes
Show HAC change in output table: Should the HAC change be shown in the output table
Graph Distance Similarity: If a fragmentation value attachment point graph distance fingerprint was calculated during fragmentation, than that can be used to restrict the transforms generated according to various similarity or disance cut-off functions (see above)
Cutoff (Double): The cutoff threshold for doubles
Cutoff (Integer): The cutoff threshold for integers
Graph Distance fingerint column: The column containing the counts fingerprint for the graph distances between attachment points
Include distance/similarity in output: Should the calculated graph distance or similarity be included in the output table

Pass-through columns

Left Columns to pass through unchanged: The columns from the left molecule of the transform to pass through unchanged
Right Columns to pass through unchanged: The columns from the right molecule of the transform to pass through unchanged

Difference columns

Left - Right: Those numeric (int, double, long, complex number) columns for which the L-R difference should be calculated
Right - Left: Those numeric (int, double, long, complex number) columns for which the R-L difference should be calculated

Ratio columns

Left / Right: Those numeric double columns for which the L/R ratio should be calculated
Right - Left: Those numeric double columns for which the R/L ratio should be calculated

Output Settings

Remove Explicit H's from output: Explicit hydrogens will be removed from the output if selected
Show unchanging portion: A SMILES cell will be included showing the 'key' resulting in the fragmentation pattern
Show number of changing atoms: The number of heavy atoms (not including 'A', the attachment point) will be included for Left and Right fragments
Show ratio of constant / changing heavy atoms: The ratio of constant / changing heavy atoms (not including 'A', the attachment point) will be included for Left and Right fragments
Show reverse-direction transforms: The transformations will be duplicated in the 'reverse' direction, e.g. A-->B and B-->A
Include Reactions SMARTS: In addition to the SMIRKS representation of the transformation, the transform is shown in an rSMARTS representation with atom mappings

Input Ports

: Fragmented molecule key-value pairs (The 'Right' part of pair in forwards direction)
: Fragmented molecule key-value pairs (The 'Left' part of pair in forwards direction)

Output Ports

: Matched pair transformations

Popular Predecessors

Popular Successors

Views

This node has no views

Workflows

04_Databased_MMP_ExampleKNIME Hub

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension Vernalis KNIME Nodes from the below update site following our NodePit Product and Node Installation Guide:

v5.5

Plugin provider: Vernalis Research, UK

Plugin version: 1.38.2.v202504171302

On NodePit since: 2025-07-02

Last update: 2025-07-26

KNIME versions: v5.5, v5.4, v5.3, v5.2, v5.1, v4.7, v4.6, v4.5, v4.4, v4.3, v4.2, v4.1, v4.0, v3.7, v3.6

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!