0 ×

03_​Simple_​MMP_​Example

Workflow

Simple Matched Molecular Pairs (MMP) Example

This workflow provides a simple example of generating matched molecular pairs (MMPs) from a set of compounds and using them to predict models with improved properties - in this case, CYP3A4 inhibition using ChEMBL data.

The MMP Molecule Fragment node is configured to make 1 cut, using the original Hussein/Rea schema. As we have not pre-filtered the incoming molecule table, we limit by complexity to 5000 cut combinations, and also filter the fragmentations by the ratio of and minimum number of unchanging atoms.
We do not calculate graph distance fingerprints in this example (1 cut only will always return an empty fingerprint), but we do calculate attachment point fingerprints in case we want to restrict the MMP by it's molecular context later. We have passed through all the data columns, and also, for illustrative purposes here, elected to render the fragmentation so we can see what is happening. We are using the ChEMBL Parent ID as the ID (Note therefore that this column appears in the output as 'ID' and not as it's incoming name, even though we select it in the pass-through table).

With the fragmentation performed, MMPs are generated. We defined a number of ratio (R/L) and differences (R-L; for log propertied) and a few pass-through properties, including the attachment point fingerprint. NB We also decided to restrict transforms by the change in heavy atom count. The first stage of the node execution is sorting the input table by the 'Key' column, whereafter pair generation is parallelised.

After, some filter is performed: we could do a simplistic filter, for any transform which has a negative value (we want less active compounds against CYP3A4!) in the 'PCHEMBL_VALUE (R-L)' column, but that would give us transforms which sometimes improve matters but generally don't. Instead, we use groupby to give the mean and standard deviation. We only want transforms where there are at least 3 examples (the final column in the grouping table), and the mean pCHEMBL value is at least 1 std dev below 0.0. Sorting the table gives the biggest changes first, and we can also look at the effect if tge transform on ALOGP and PSA.

Eventually, we apply the transforms with and without filtering: The Apply Transforms node pre-sorts the transforms table, and then applies each transform to the entire molecules table. Two node views show progress in either a simple form, or a more informative format showing the current 'active' transforms. If we try to use the Filter by attachment point fingerprint, we will get a warning at this point as the group/ungroup sequence has lost the Fingerprint column properties which tell the node how to generate the fingerprint.

In the below part there is the workaround for fingerprint similarity filtering - use a joiner to attach the properties & fingerprints back to the required set of transforms.
Note in this case, a low Tanimoto Similarity threshold is required to get any matching transforms.

Notes
1 - The transform will be applied if any of the rows containing it pass the similarity threshold - although the transform is the same, the environments from the molecules it was created from could (will?) be different
2 - If there are multiple matching sites in a molecule, only those which match the environment similarity threshold will be reacted
3 - If there are multiple matching sites, each site will be reacted in turn, with products only resulting from a single transformation returned

Matched Molecular PairsMMPMMPAMatched Molecular Pair AnalysisCYP-InhibitionVernalisChEMBL
The molecules are cleaned up -desalted and the data aggregated per structure, then the molecules are fragmented.NOTE: the fragmentation might be time-consuming. The preferences under KNIME->Vernalis->Matched MolecularPairs (MMPs) allow control of process usage, and you can view progress during execution with the 'FragmentationProgress' view of the node: Sort and filter out transforms Fingerprint similarity filtering This workflow aims to give a simple example of generating and applying MMPs using the Vernalis CommunityContribution. Please note that it does not explain all settings or features, and uses only a single cut (i.e. substituentreplacement) methodology. Read the workflow description for further information.It requires the Chembl_CYP3A4_Activity_bioactivity-17_9_14_32.txt if it is to be re-run Apply transform with and without filtering Tanimoto similarity threshold required check if it still makes sense ChEMBL Bioactivity dataRight click -> View: FragmentationProgressOnly fragmenteach structure onceGenerate PairsCalculate mean and std.dev. of transformTransform withat least 3 examplesmean(pCHEMBL) > 1&& stddev < 0.0biggestchanges firstUnfilteredTEST MOLECULE(FICTITIOUS!)Needed toapply transformfiltered guessingdefault FP settingsDANGEROUS!Only key propertiesTransform withat least 3 examplesmean(pCHEMBL) > 1&& stddev < 0.0biggestchanges firstJoin backAP fingerprint propsfilteredTanimoto S > 0.25filteredTanimoto S > 0.4File Reader MMP MoleculeFragment (RDKit) GroupBy Speedy SMILESDe-salt Fragments to MMPs GroupBy Row Filter Java SnippetRow Filter Sorter Apply Transforms(RDKit) (Experimental) MarvinSketch Ungroup Apply Transforms(RDKit) (Experimental) GroupBy Row Filter Java SnippetRow Filter Sorter Joiner Apply Transforms(RDKit) (Experimental) Apply Transforms(RDKit) (Experimental) The molecules are cleaned up -desalted and the data aggregated per structure, then the molecules are fragmented.NOTE: the fragmentation might be time-consuming. The preferences under KNIME->Vernalis->Matched MolecularPairs (MMPs) allow control of process usage, and you can view progress during execution with the 'FragmentationProgress' view of the node: Sort and filter out transforms Fingerprint similarity filtering This workflow aims to give a simple example of generating and applying MMPs using the Vernalis CommunityContribution. Please note that it does not explain all settings or features, and uses only a single cut (i.e. substituentreplacement) methodology. Read the workflow description for further information.It requires the Chembl_CYP3A4_Activity_bioactivity-17_9_14_32.txt if it is to be re-run Apply transform with and without filtering Tanimoto similarity threshold required check if it still makes sense ChEMBL Bioactivity dataRight click -> View: FragmentationProgressOnly fragmenteach structure onceGenerate PairsCalculate mean and std.dev. of transformTransform withat least 3 examplesmean(pCHEMBL) > 1&& stddev < 0.0biggestchanges firstUnfilteredTEST MOLECULE(FICTITIOUS!)Needed toapply transformfiltered guessingdefault FP settingsDANGEROUS!Only key propertiesTransform withat least 3 examplesmean(pCHEMBL) > 1&& stddev < 0.0biggestchanges firstJoin backAP fingerprint propsfilteredTanimoto S > 0.25filteredTanimoto S > 0.4 File Reader MMP MoleculeFragment (RDKit) GroupBy Speedy SMILESDe-salt Fragments to MMPs GroupBy Row Filter Java SnippetRow Filter Sorter Apply Transforms(RDKit) (Experimental) MarvinSketch Ungroup Apply Transforms(RDKit) (Experimental) GroupBy Row Filter Java SnippetRow Filter Sorter Joiner Apply Transforms(RDKit) (Experimental) Apply Transforms(RDKit) (Experimental)

Download

Get this workflow from the following link: Download

Nodes

03_​Simple_​MMP_​Example consists of the following 20 nodes(s):

Plugins

03_​Simple_​MMP_​Example contains nodes provided by the following 5 plugin(s):