Reference Fragments to MMPs

This node implements the Hussain and Rea algorithm for finding Matched Molecular Pairs in a dataset. The node takes two input tables of fragments generated MMP Molecule Fragment nodes and generates an output table of matched molecular pairs (MMPs)

In this implementation pairs are only created between rows of the query and reference tables (the 'forwards' direction is from the 'Left' query row to the 'Right' reference row). Both tables must have the same structure

The node requires two SMILES input columns, representing the 'key' (unchanging atoms) and 'value', and a string column containing the ID. The node will attempt to auto-guess these column selections based on the default names for the columns output by the fragment node.

The input table can contain fragmentations from differing numbers of cuts, in which case this will be reflected in the output table.

The table will be pre-sorted by key followed by value during execution, unless the 'Incoming table is sorted by Keys and Values?' option is selected. If this option is selected and correct sorting is not applied, then pairs may be missed (incorrect keys sorting) or non-canonical in their direction (incorrect values sorting)

Incoming columns can be passed through unchanged (Left, Right or both), numeric columns (Integer, Long, Double and Complex Number) can have differences (L - R or R - L) and ratios (Double only) calculated (L / R or R / L)

Transforms can be filtered based on the Value Attachment point graph distance calculated during fragmentation using a number of options

  • None - No filtering
  • Max total graph distance change - the sum of all graph distance changes
  • Max single graph distance change - the maximum tolerated change in any single distance
  • Tanimoto - the vector Tanimoto similarity
  • Dice - the vector Dice similarity
  • Cosine - the vector Cosine similarity
  • Euclidean - the vector Euclidean distance
  • Hamming - the vector Hamming (Manhattan or City-block) distance
  • Soergel - the vector Soergel distance
Filtering can also be performed based on the change in heavy atom count during the transformation

This node was developed by Vernalis Research . For feedback and more information, please contact knime@vernalis.com

1.J. Hussain and C Rea, " Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large datasets ", J. Chem. Inf. Model. , 2010, 50 , 339-348 (DOI: 10.1021/ci900450m ).

Options

Select the Fragment Key column
Select the column containing the fragment 'keys'
Select the Fragment Value column
Select the column containing the fragment 'values'
Incoming table is sorted by Keys and Values?
Use this option if the input table is pre-sorted by 'keys', then by 'values'. See above for details
Select the ID column
Select the column containing the parent molecule IDs
Allow self-transforms
Allows two regioisomeric fragmentations of an input molecule resulting in identical keys but differing values to provide a 'self-transform' between the fragmentations
Filter by HAC Change
Should the transform be filtered by delta HAC? NB This is asymmetric so the 'Show reverse-direction transforms' option will not show pairs in some cases, e.g. if the range is set from -2 to +4 then a transform losing 3 heavy atoms in the forwards direction will only show in the reverse direction
HAC Change Range
The range of acceptable HAC changes
Show HAC change in output table
Should the HAC change be shown in the output table
Graph Distance Similarity
If a fragmentation value attachment point graph distance fingerprint was calculated during fragmentation, than that can be used to restrict the transforms generated according to various similarity or disance cut-off functions (see above)
Cutoff (Double)
The cutoff threshold for doubles
Cutoff (Integer)
The cutoff threshold for integers
Graph Distance fingerint column
The column containing the counts fingerprint for the graph distances between attachment points
Include distance/similarity in output
Should the calculated graph distance or similarity be included in the output table

Pass-through columns

Left Columns to pass through unchanged
The columns from the left molecule of the transform to pass through unchanged
Right Columns to pass through unchanged
The columns from the right molecule of the transform to pass through unchanged

Difference columns

Left - Right
Those numeric (int, double, long, complex number) columns for which the L-R difference should be calculated
Right - Left
Those numeric (int, double, long, complex number) columns for which the R-L difference should be calculated

Ratio columns

Left / Right
Those numeric double columns for which the L/R ratio should be calculated
Right - Left
Those numeric double columns for which the R/L ratio should be calculated

Output Settings

Remove Explicit H's from output
Explicit hydrogens will be removed from the output if selected
Show unchanging portion
A SMILES cell will be included showing the 'key' resulting in the fragmentation pattern
Show number of changing atoms
The number of heavy atoms (not including 'A', the attachment point) will be included for Left and Right fragments
Show ratio of constant / changing heavy atoms
The ratio of constant / changing heavy atoms (not including 'A', the attachment point) will be included for Left and Right fragments
Show reverse-direction transforms
The transformations will be duplicated in the 'reverse' direction, e.g. A-->B and B-->A
Include Reactions SMARTS
In addition to the SMIRKS representation of the transformation, the transform is shown in an rSMARTS representation with atom mappings

Input Ports

Icon
Fragmented molecule key-value pairs (The 'Right' part of pair in forwards direction)
Icon
Fragmented molecule key-value pairs (The 'Left' part of pair in forwards direction)

Output Ports

Icon
Matched pair transformations

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.