Matched Molecular Pairs (RDKit)

This Node Is Deprecated — This node is kept for backwards-compatibility, but the usage in new workflows is no longer recommended. The documentation below might contain more information.

This node implements the Hussain and Rea algorithm for finding Matched Molecular Pairs in a dataset (See Ref. 1). The user can specify the number of cuts to be made (1 - 10), and whether Hydrogens should be added.

A variety of fragmentation options are included:

  1. "All acyclic single bonds" - Any acyclic single bond between any two atoms will be broken. This is the most exhaustive approach, but can generate a large number of pairs (rSMARTS: [*:1]!@!=!#[*:2]>>[*:1]-[*].[*:2]-[*])
  2. "Only acyclic single bonds to rings" - Single acyclic bonds between any atoms will be broken, as long as at least one atom is in a ring (rSMARTS: [*;R:1]!@!=!#[*:2]>>[*:1]-[*].[*:2]-[*]).
  3. "Only single bonds to a heteroatom" - Single acyclic bonds between any two atoms, at least one of which is not Carbon will be broken. Included to mirror C-X bond breaking chemistry prevalent in modern drug discovery (e.g. SNAr, Reductive Aminations, Amide formations etc. See Ref. 2) (rSMARTS: [!#6:1]!@!=!#[*:2]>>[*:1]-[*].[*:2]-[*])
  4. "Non-functional group single bonds" - This reproduces the fragmentation pattern used in the original Hussein/Rea paper (See footnote 24, Ref. 1), and also used in the RDKit Python implementation (Ref 3) (rSMARTS: [#6+0;!$(*=,#[!#6]):1]!@!=!#[*:2]>>[*:1]-[*].[*:2]-[*])
  5. "User defined" - The user needs to provide their own rSMARTS fragmentation definition, following the guidelines below.

Guidelines for Custom rSMARTS Definition

  • '>>' is required to separate reactants and products
  • Products require '[*]' to occur twice, for the attachment points (the node will handle the tagging of these)
  • Reactants and products require exactly two atom mappings, e.g. :1] and :2] (other values could be used).
  • The atom mappings must be two different values
  • The same atom mappings must be used for reactants and products
rSMARTS not conforming to these guidelines will be rejected during node configuration.

Optionally, when only a single cut is made, or connectivity tracking is enabled, context-fingerprints can be generated (one for each attachment point). The fingerprints generated are RDKit Morgan fingerprints, rooted at the attachment point(s) of the unchanging portion

The algorithm is implemented using the RDKit toolkit.

This node was developed by Vernalis Research . For feedback and more information, please contact

1.J. Hussain and C Rea, " Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large datasets ", J. Chem. Inf. Model. , 2010, 50 , 339-348 (DOI: 10.1021/ci900450m ).

2. S. D. Roughley and A. M. Jordan " The Medicinal Chemist’s Toolbox: An Analysis of Reactions Used in the Pursuit of Drug Candidates ", J. Med. Chem. , 2011, 54 , 3451-3479 (DOI: 10.1021/jm200187y )

3. G. Landrum " An Overview of RDKit " (section entitled 'mmpa')


Select Molecule column
Select the column containing the molecules
Select Molecule IDs column
Select the column containing the molecule IDs
Select the Fragmentation Type
Select the required fragmentation option
The optional user-defined rSMARTS (see above for details)
Number of cuts
Select the number of cuts (1-10). NB Large values can result in slow processing times
Track Connectivity?
When more than one bond is being cut, tracking connectivity ensures that substituents on core replacements have the correct regiochemistry, as described in Hussain and Rea. Unsetting this option loses this regiochemistry information, but may serve use in a broader 'ideas generation' context

Advanced Settings

Add H's prior to fragmentation
If checked, pairs with -H as a substituent will be included. This is recommended for when the number of cuts is 1, and is unavailable for other values
Remove Explicit H's from output
Explicit hydrogens will be removed from the output if selected (Only available when 'Add H's prior to fragmentation' is selected and enabled)
Filter by maximum number of changing heavy atoms?
If checked, the user can specify a maximum number of heavy atoms which are allowed to change between Matched Pairs
Maximum Number of variable heavy atoms
The maximum number of heavy atoms which are allowed to change between pairs
Filter by ratio of changing / unchanging atoms?
If checked, the user can specify a maximum ratio of changing to unchanging heavy atoms during fragmentation
Minimum ratio of changing to unchanging heavy atoms
The minimum ratio of changing to unchanging heavy atoms

Output Settings

Show unchanging portion
A SMILES cell will be included showing the 'key' resulting in the fragmentation pattern
Show number of changing atoms
The number of heavy atoms (not including 'A', the attachment point) will be included for Left and Right fragments
Show ratio of constant / changing heavy atoms
The ratio of constant / changing heavy atoms (not including 'A', the attachment point) will be included for Left and Right fragments
Show reverse-direction transforms
The transformations will be duplicated in the 'reverse' direction, e.g. A-->B and B-->A
Include Reactions SMARTS
In addition to the SMIRKS representation of the transformation, the transform is shown in an rSMARTS representation with atom mappings. Using this option without the 'Track Connectivity' option selected will produce nonsense rSMARTS!

Attachment Point Fingerprints

Add Attachment Point Fingerprints
If checked, then attachment point fingerprints are added. See above for further details. One column is added for each attachment point
Fingerprint Length
The number of bits in the fingerprints
Morgan Radius
The radius of the Morgan fingerprint
Use Bond Types
Should the bond types be included in the fingerprint generation
Use chirality
Should chirality be included in the fingerprint generation

Input Ports

Molecules for fragmenting to find matched pairs

Output Ports

Matched pair transformations
Input rows for which the molecule could not be parsed in RDKit

Popular Predecessors

  • No recommendations found

Popular Successors

  • No recommendations found


This node has no views


  • No workflows found



You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.