This node implements the Hussain and Rea algorithm for finding Matched Molecular Pairs in a dataset. The node takes two input tables of fragments generated MMP Molecule Fragment nodes and generates an output table of matched molecular pairs (MMPs)

In this implementation pairs are only created between rows of the query and reference tables (the 'forwards' direction is from the 'Left' query row to the 'Right' reference row). Both tables must have the same structure

The node requires two SMILES input columns, representing the 'key' (unchanging atoms) and 'value', and a string column containing the ID. The node will attempt to auto-guess these column selections based on the default names for the columns output by the fragment node.

The input table can contain fragmentations from differing numbers of cuts, in which case this will be reflected in the output table.

The table will be pre-sorted by key followed by value during execution, unless the 'Incoming table is sorted by Keys and Values?' option is selected. If this option is selected and correct sorting is not applied, then pairs may be missed (incorrect keys sorting) or non-canonical in their direction (incorrect values sorting)

Incoming columns can be passed through unchanged (Left, Right or both), numeric columns (Integer, Long, Double and Complex Number) can have differences (L - R or R - L) and ratios (Double only) calculated (L / R or R / L)

Transforms can be filtered based on the Value Attachment point graph distance calculated during fragmentation using a number of options

- None - No filtering
- Max total graph distance change - the sum of all graph distance changes
- Max single graph distance change - the maximum tolerated change in any single distance
- Tanimoto - the vector Tanimoto similarity
- Dice - the vector Dice similarity
- Cosine - the vector Cosine similarity
- Euclidean - the vector Euclidean distance
- Hamming - the vector Hamming (Manhattan or City-block) distance
- Soergel - the vector Soergel distance

This node was developed by Vernalis Research . For feedback and more information, please contact knime@vernalis.com

1.J. Hussain and C Rea, "
*Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large datasets*
",
*J. Chem. Inf. Model.*
, 2010,
**50**
, 339-348 (DOI:
10.1021/ci900450m
).

- Select the Fragment Key column
- Select the column containing the fragment 'keys'
- Select the Fragment Value column
- Select the column containing the fragment 'values'
- Incoming table is sorted by Keys and Values?
- Use this option if the input table is pre-sorted by 'keys', then by 'values'. See above for details
- Select the ID column
- Select the column containing the parent molecule IDs
- Allow self-transforms
- Allows two regioisomeric fragmentations of an input molecule resulting in identical keys but differing values to provide a 'self-transform' between the fragmentations
- Filter by HAC Change
- Should the transform be filtered by delta HAC? NB This is asymmetric so the 'Show reverse-direction transforms' option will not show pairs in some cases, e.g. if the range is set from -2 to +4 then a transform losing 3 heavy atoms in the forwards direction will only show in the reverse direction
- HAC Change Range
- The range of acceptable HAC changes
- Show HAC change in output table
- Should the HAC change be shown in the output table
- Graph Distance Similarity
- If a fragmentation value attachment point graph distance fingerprint was calculated during fragmentation, than that can be used to restrict the transforms generated according to various similarity or disance cut-off functions (see above)
- Cutoff (Double)
- The cutoff threshold for doubles
- Cutoff (Integer)
- The cutoff threshold for integers
- Graph Distance fingerint column
- The column containing the counts fingerprint for the graph distances between attachment points
- Include distance/similarity in output
- Should the calculated graph distance or similarity be included in the output table

- Left Columns to pass through unchanged
- The columns from the left molecule of the transform to pass through unchanged
- Right Columns to pass through unchanged
- The columns from the right molecule of the transform to pass through unchanged

- Left - Right
- Those numeric (int, double, long, complex number) columns for which the L-R difference should be calculated
- Right - Left
- Those numeric (int, double, long, complex number) columns for which the R-L difference should be calculated

- Left / Right
- Those numeric double columns for which the L/R ratio should be calculated
- Right - Left
- Those numeric double columns for which the R/L ratio should be calculated

- Remove Explicit H's from output
- Explicit hydrogens will be removed from the output if selected
- Show unchanging portion
- A SMILES cell will be included showing the 'key' resulting in the fragmentation pattern
- Show number of changing atoms
- The number of heavy atoms (not including 'A', the attachment point) will be included for Left and Right fragments
- Show ratio of constant / changing heavy atoms
- The ratio of constant / changing heavy atoms (not including 'A', the attachment point) will be included for Left and Right fragments
- Show reverse-direction transforms
- The transformations will be duplicated in the 'reverse' direction, e.g. A-->B and B-->A
- Include Reactions SMARTS
- In addition to the SMIRKS representation of the transformation, the transform is shown in an rSMARTS representation with atom mappings

- This node has no views

- No links available

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

To use this node in KNIME, install the extension Vernalis KNIME Nodes from the below update site following our NodePit Product and Node Installation Guide:

v5.2

A zipped version of the software site can be downloaded here.

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud
or on-premises – with our brand new **NodePit Runner**.

Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com, follow @NodePit on Twitter or botsin.space/@nodepit on Mastodon.

**Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.**