MMP Molecule Fragment (RDKit)

This node implements the molecule fragmentation part of the Hussain and Rea algorithm (Ref 1) for finding Matched Molecular Pairs in a dataset, enabling the fragmented molecule key-value pairs to be stored in a database for later recall or used directly in a subsequent pair-finding node. The user can specify the number of cuts to be made (1 - 10), and whether Hydrogens should be added (1 cut only)

A variety of fragmentation options are included:

"All acyclic single bonds" - Any acyclic single bonds between any two atoms will be broken. This is the most exhaustive approach, which can generate a large number of pairs (rSMARTS: [*:1]!@!=!#[*:2]>>[*:1]-[*].[*:2]-[*])
"Only acyclic single bonds to rings" - Single acyclic bonds between any atoms will be broken, as long as at least one atom is in a ring (rSMARTS: [*;R:1]!@!=!#[*:2]>>[*:1]-[*].[*:2]-[*])
"Only acyclic single bonds to either rings or to double bonds exocyclic to rings" - Single acyclic bonds between any atoms will be broken, as long as 1 atom is either in a ring, or in a double bond exocyclic to a ring, with the other end in the ring (rSMARTS: [*:1]!@!=!#[*;!R0,$(*=!@[*!R0]):2]>>[*:1]-[*].[*:2]-[*])
"Only single bonds to a heteroatom" - Single acyclic bonds between any two atoms, at least one of which is not Carbon will be broken. Included to mirror C-X bond breaking chemistry prevalent in modern drug discovery (e.g. SNAr, Reductive Aminations, Amide formations etc. See Ref. 2) (rSMARTS: [!#6:1]!@!=!#[*:2]>>[*:1]-[*].[*:2]-[*])
"Non-functional group single bonds" - This reproduces the fragmentation pattern used in the original Hussein/Rea paper (See footnote 24, Ref. 1), and also used in the RDKit Python implementation (Ref 3) (rSMARTS: [#6+0;!$(*=,#[!#6]):1]!@!=!#[*:2]>>[*:1]-[*].[*:2]-[*])
"Matsy (One atom in ring, or a non-sp2 C atom bonded to a non-C atom)" - This reproduces the fragmentation pattern used by NextMove's 'Matsy', i.e. single acyclic bonds between either a ring atom and any other atom, or a heteroatom bonded to a non-sp2 C atom, as described in the Matched Series paper (Ref 4) (rSMARTS: [$([#6!^2]-!@[!#6]),$([*;R]-!@[*]):1]-!@[$([!#6]-!@[#6!^2]),$([*]-!@[*;R]):2]>>[*:1]-[*].[*:2]-[*])
"Peptide Sidechains" - Acyclic single bonds from Cα to Cβ will be broken. C-H will only be broken for Glycine, and only when explicit H are present (both CH bonds will be broken in this case) (rSMARTS: [C;$(CC(=O)[O,N]);$(CN):1]-!@[$([C]-!@C(C(=O)[N,O])N),$([#1]-!@[CH2](C(=O)[N,O])N):2]>>[*:1]-[*].[*:2]-[*])
"Nucleic Acid Sidechains" - Acyclic single bonds in the anomeric position between the aromatic base N and sugar will be broken. The minimum requirement is N(Ar)CO(CO)CO to allow for open chain analogues (rSMARTS: [n:1]-!@[$(COC(CO)CO):2]>>[*:1]-[*].[*:2]-[*])
"User defined" - The user needs to provide their own (r)SMARTS fragmentation definition, following the guidelines below

Guidelines for Custom (r)SMARTS Definition
An rSMARTS is no longer required, but may be specified if preferred for backwards compatibility. If specified must comply with the following rules. Otherwise, simply a match for two atoms separated by a single, acyclic bond must be provided

'>>' is required to separate reactants and products
Products require '[*]' to occur twice, for the attachment points (the node will handle the tagging of these)
Reactants and products require exactly two atom mappings, e.g. :1] and :2] (other values could be used).
The atom mappings must be two different values
The same atom mappings must be used for reactants and products

The algorithm is implemented using the RDKit toolkit

This node was developed by Vernalis Research. For feedback and more information, please contact knime@vernalis.com

1. J. Hussain and C Rea, "Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large datasets", J. Chem. Inf. Model., 2010, 50, 339-348 (DOI:10.1021/ci900450m)

2. S. D. Roughley and A. M. Jordan, "The Medicinal Chemist�s Toolbox: An Analysis of Reactions Used in the Pursuit of Drug Candidates", J. Med. Chem., 2011, 54, 3451-3479 (DOI:10.1021/jm200187y)

3. G. Landrum, "An Overview of RDKit (http://www.rdkit.org/docs/Overview.html#the-contrib-directory) (section entitled 'mmpa')

4. N. M. O'Boyle, J. Bostrom, R. A. Sayle and A. Gill, "Using Matched Molecular Series as a Predictive Tool To Optimize Biological Activity", J. Med. Chem., 2014, 57, 2704-2713 (DOI:10.1021/jm500022q)

Options

Molecule & Fragmentation Options

Select Molecule column: Select the column containing the molecules
Select Molecule IDs column: Select the column containing the molecule IDs
Allow HiLiting: If selected, then hiliting between the incoming molecules and fragments is preserved. WARNING - this can result in significant memory use for big tables or large numbers of fragmentations!
Select the Fragmentation Type: Select the required fragmentation option
User SMARTS: The optional user-defined (r)SMARTS (see above for details)
Number of cuts: Select the number of cuts (1-10)
Allow 2 cuts along single bond giving a single bond as 'value'?: If selected, for the 2 cuts case, 1 bond can be cut twice, allowing a 'value' of [*:1]-[*:2] (i.e. a 'bond') to be formed
Explicit Hydrogens: Options for handling explicit hydrogens during fragmentation. Users are strongly advised to retain the recommended default settingsTo understand these options, fragmentation is performed using either a single or two (if explicit H's are added for 1 cut) 'fragmentation factories' for each incoming molecule. The factory with explicit H's added is only used for 1 cut, and only for those cuts of bonds to 'H'. All other bond breaks are performed using the 2nd factory. If you wish to achieve specific effects, using these settings beyond the default then we suggest trialing with with a simple example molecule containing an explicit 'H', e.g. '[H]c1c([Cl])cccc1' or 'C/C([H])=C/C'
Add H's prior to fragmentation: If checked, pairs with -H as a substituent will be included. This is recommended for when the number of cuts is 1, and is unavailable for other values
Remove Added Explicit H's from output: Explicit hydrogens added for a single cut will be removed from the output if selected (Only available when 'Add H's prior to fragmentation' is selected and enabled)
Incoming explicit H's treatment: If incoming molecules contain any explicit H's, how should they be treated? The preferred option is to remove them prior to fragmentation, otherwise seemingly spurious results may follow.

Fragmentation Filtering Settings

Limit by Complexity: If checked, this option will skip molecules likely to have a very large number of fragmentations, based on the number of possible fragmentable bond combinations (different bond combinations leading to identical fragmentations are not discounted)
Maximum Fragmentations: The limit of predicted fragmentations
Treat no undefined chiral centres as chiral: In molecules with explicit chiral centres, newly created stereocentres are given defined chirality. Molecules with only undefined possible stereocentres (e.g. CC(F)Br) will not have explicit stereochemistry assigned to newly created centres. Molecules with neither explicit or undefined stereocentres will have explicit chirality set at newly created centres if this option is selected, otherwise they will not be set.
Filter by maximum number of changing heavy atoms?: If checked, the user can specify a maximum number of heavy atoms which are allowed to change between Matched Pairs
Maximum Number of variable heavy atoms: The maximum number of heavy atoms which are allowed to change between pairs
Filter by fixed heavy atoms?: If checked, the user can specify a minimum number of heavy atoms which are present in the 'Key'
Min Fixed ('Key') Heavy Atoms: The minimum number of heavy atoms which are required in the fixed 'Key' part of the pair
Filter by ratio of changing / unchanging atoms?: If checked, the user can specify a maximum ratio of changing to unchanging heavy atoms during fragmentation
Minimum ratio of changing to unchanging heavy atoms: The minimum ratio of changing to unchanging heavy atoms

Output Settings

Show number of changing atoms: The number of heavy atoms (not including 'A', the attachment point) will be included for Left and Right fragments
Show ratio of constant / changing heavy atoms: The ratio of constant / changing heavy atoms (not including 'A', the attachment point) will be included for Left and Right fragments
Add failure reasons to 2nd output table: If checked, the reason the molecule could not be fragmented is added to the second output table
Render Fragmentation: Should the fragmentation be rendered
Show breaking bonds: Should breaking bonds be highlighted in the rendering?
Breaking bond colour: The colour to highlight the breaking bond(s)
Show key: Should the atoms/bonds forming the 'key' be highlighted in the rendering?
Key Colour: The colour to highlight the 'key'
Show value: Should the atoms/bonds forming the 'value' be highlighted in the rendering? If the fragmentation is a double cut to 1 bond, then the breaking bond is also the value, and will be shown as such
Value Colour: The colour to highlight the 'value'
Incoming columns to keep: Select incoming data columns to keep. The ID column will always be present in the output, regardless of the setting here, with the name 'ID'. Fragmentation columns will be left-most in the table, and incoming columns may be renamed by the addition of a suffix, e.g. '(#1)', to avoid duplicate names)

Fingerprints

Show 'Value' attachment point graph distances fingerprint: Include a graph distance fingerprint showing the graph distance between each attachment point in the fragmentation 'value'. The fingeprint is the number of bonds between each pair of attachment points, so '[*:1]-[*:2]' is {1}, '[*:1]c1c([*:2])cc([*:3])cc1' is {3,5,4} etc
Add Attachment Point Fingerprints: If checked, then attachment point fingerprints are added. See above for further details. One column is added for each attachment point
Fingerprint Length: The number of bits in the fingerprints
Morgan Radius: The radius of the Morgan fingerprint
Use Bond Types: Should the bond types be included in the fingerprint generation
Use chirality: Should chirality be included in the fingerprint generation

Input Ports

: Molecules for fragmenting

Output Ports

: Key-value fragmentation pairs
: Input rows for which the molecule could not be parsed in RDKit, or which could not be fragmented according to the options specified

Popular Predecessors

Popular Successors

Views

Fragmentation Progress: The view shows the proportion of the table completely processed, the proportion of the queue currently filled, and the proportion of the allocated threads currently active. The size of the queue and the number of threads can be controlled in the preferences - a bigger queue may use more memory, but is more likely to keep all parallel threads active, resulting in shorter processing times

Workflows

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension Vernalis KNIME Nodes from the below update site following our NodePit Product and Node Installation Guide:

v5.6

Plugin provider: Vernalis Research, UK

Plugin version: 1.38.2.v202504171302

On NodePit since: 2025-08-15

Last update: 2025-08-20

Tags: Streamable

KNIME versions: v5.6, v5.5, v5.4, v5.3, v5.2, v5.1, v4.7, v4.6, v4.5, v4.4, v4.3, v4.2, v4.1, v4.0, v3.7, v3.6

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!