0 ×

MMP Molecule Filter (RDKit)

StreamableVernalis Chemistry Matched-Molecular Pairs extension for KNIME Workbench version 1.27.2.v202010191232 by Vernalis (R&D), UK

This node pre-filters molecules by viabilty in the specified MMP schema. The user can specify the number of cuts to be made (1 - 10), and whether Hydrogens should be added (for 1 cut only).

A variety of fragmentation options are included:

  1. "All acyclic single bonds" - Any acyclic single bonds between any two atoms will be broken. This is the most exhaustive approach, which can generate a large number of pairs (rSMARTS: [*:1]!@!=!#[*:2]>>[*:1]-[*].[*:2]-[*])
  2. "Only acyclic single bonds to rings" - Single acyclic bonds between any atoms will be broken, as long as at least one atom is in a ring (rSMARTS: [*;R:1]!@!=!#[*:2]>>[*:1]-[*].[*:2]-[*])
  3. "Only acyclic single bonds to either rings or to double bonds exocyclic to rings" - Single acyclic bonds between any atoms will be broken, as long as 1 atom is either in a ring, or in a double bond exocyclic to a ring, with the other end in the ring (rSMARTS: [*:1]!@!=!#[*;!R0,$(*=!@[*!R0]):2]>>[*:1]-[*].[*:2]-[*])
  4. "Only single bonds to a heteroatom" - Single acyclic bonds between any two atoms, at least one of which is not Carbon will be broken. Included to mirror C-X bond breaking chemistry prevalent in modern drug discovery (e.g. SNAr, Reductive Aminations, Amide formations etc. See Ref. 2) (rSMARTS: [!#6:1]!@!=!#[*:2]>>[*:1]-[*].[*:2]-[*])
  5. "Non-functional group single bonds" - This reproduces the fragmentation pattern used in the original Hussein/Rea paper (See footnote 24, Ref. 1), and also used in the RDKit Python implementation (Ref 3) (rSMARTS: [#6+0;!$(*=,#[!#6]):1]!@!=!#[*:2]>>[*:1]-[*].[*:2]-[*])
  6. "Matsy (One atom in ring, or a non-sp2 C atom bonded to a non-C atom)" - This reproduces the fragmentation pattern used by NextMove's 'Matsy', i.e. single acyclic bonds between either a ring atom and any other atom, or a heteroatom bonded to a non-sp2 C atom, as described in the Matched Series paper (Ref 4) (rSMARTS: [$([#6!^2]-!@[!#6]),$([*;R]-!@[*]):1]-!@[$([!#6]-!@[#6!^2]),$([*]-!@[*;R]):2]>>[*:1]-[*].[*:2]-[*])
  7. "Peptide Sidechains" - Acyclic single bonds from Cα to Cβ will be broken. C-H will only be broken for Glycine, and only when explicit H are present (both CH bonds will be broken in this case) (rSMARTS: [C;$(CC(=O)[O,N]);$(CN):1]-!@[$([C]-!@C(C(=O)[N,O])N),$([#1]-!@[CH2](C(=O)[N,O])N):2]>>[*:1]-[*].[*:2]-[*])
  8. "Nucleic Acid Sidechains" - Acyclic single bonds in the anomeric position between the aromatic base N and sugar will be broken. The minimum requirement is N(Ar)CO(CO)CO to allow for open chain analogues (rSMARTS: [n:1]-!@[$(COC(CO)CO):2]>>[*:1]-[*].[*:2]-[*])
  9. "User defined" - The user needs to provide their own (r)SMARTS fragmentation definition, following the guidelines below

Guidelines for Custom (r)SMARTS Definition
An rSMARTS is no longer required, but may be specified if preferred for backwards compatibility. If specified must comply with the following rules. Otherwise, simply a match for two atoms separated by a single, acyclic bond must be provided

  • '>>' is required to separate reactants and products
  • Products require '[*]' to occur twice, for the attachment points (the node will handle the tagging of these)
  • Reactants and products require exactly two atom mappings, e.g. :1] and :2] (other values could be used).
  • The atom mappings must be two different values
  • The same atom mappings must be used for reactants and products
  • rSMARTS not conforming to these guidelines will be rejected during node configuration.

The algorithm is implemented using the RDKit toolkit

This node was developed by Vernalis Research. For feedback and more information, please contact knime@vernalis.com

1. J. Hussain and C Rea, "Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large datasets", J. Chem. Inf. Model., 2010, 50, 339-348 (DOI:10.1021/ci900450m)

2. S. D. Roughley and A. M. Jordan, "The Medicinal Chemist�s Toolbox: An Analysis of Reactions Used in the Pursuit of Drug Candidates", J. Med. Chem., 2011, 54, 3451-3479 (DOI:10.1021/jm200187y)

3. G. Landrum, "An Overview of RDKit (http://www.rdkit.org/docs/Overview.html#the-contrib-directory) (section entitled 'mmpa')

4. N. M. O'Boyle, J. Bostrom, R. A. Sayle and A. Gill, "Using Matched Molecular Series as a Predictive Tool To Optimize Biological Activity", J. Med. Chem., 2014, 57, 2704-2713 (DOI:10.1021/jm500022q)


Select Molecule column
Select the column containing the molecules
Select the Fragmentation Type
Select the required fragmentation option
The optional user-defined (r)SMARTS (see above for details)
Number of cuts
Select the number of cuts (1-10)
Add H's prior to fragmentation
If checked, pairs with -H as a substituent will be included. This is recommended for when the number of cuts is 1, and is unavailable for other values
Allow 2 cuts along single bond giving a single bond as 'value'?
If selected, for the 2 cuts case, 1 bond can be cut twice, allowing a 'value' of [*:1]-[*:2] (i.e. a 'bond') to be formed

Input Ports

Molecules for filtering

Output Ports

Molecules which can be fragmented according to the Schema. Molecules not readable by RDKit will also fail

Best Friends (Incoming)

Best Friends (Outgoing)


To use this node in KNIME, install Vernalis KNIME Nodes from the following update site:


A zipped version of the software site can be downloaded here.

You don't know what to do with this link? Read our NodePit Product and Node Installation Guide that explains you in detail how to install nodes to your KNIME Analytics Platform.

Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform. Browse NodePit from within KNIME, install nodes with just one click and share your workflows with NodePit Space.


You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.