This workflow provides a simple example of generating matched molecular pairs (MMPs) from a set of compounds and using them to predict models with improved properties - in this case, CYP3A4 inhibition using ChEMBL data.
The MMP Molecule Fragment node is configured to make 1 cut, using the original Hussein/Rea schema. As we have not pre-filtered the incoming molecule table, we limit by complexity to 5000 cut combinations, and also filter the fragmentations by the ratio of and minimum number of unchanging atoms.
We do not calculate graph distance fingerprints in this example (1 cut only will always return an empty fingerprint), but we do calculate attachment point fingerprints in case we want to restrict the MMP by it's molecular context later. We have passed through all the data columns, and also, for illustrative purposes here, elected to render the fragmentation so we can see what is happening. We are using the ChEMBL Parent ID as the ID (Note therefore that this column appears in the output as 'ID' and not as it's incoming name, even though we select it in the pass-through table).
With the fragmentation performed, MMPs are generated. We defined a number of ratio (R/L) and differences (R-L; for log propertied) and a few pass-through properties, including the attachment point fingerprint. NB We also decided to restrict transforms by the change in heavy atom count. The first stage of the node execution is sorting the input table by the 'Key' column, whereafter pair generation is parallelised.
After, some filter is performed: we could do a simplistic filter, for any transform which has a negative value (we want less active compounds against CYP3A4!) in the 'PCHEMBL_VALUE (R-L)' column, but that would give us transforms which sometimes improve matters but generally don't. Instead, we use groupby to give the mean and standard deviation. We only want transforms where there are at least 3 examples (the final column in the grouping table), and the mean pCHEMBL value is at least 1 std dev below 0.0. Sorting the table gives the biggest changes first, and we can also look at the effect if tge transform on ALOGP and PSA.
Eventually, we apply the transforms with and without filtering: The Apply Transforms node pre-sorts the transforms table, and then applies each transform to the entire molecules table. Two node views show progress in either a simple form, or a more informative format showing the current 'active' transforms. If we try to use the Filter by attachment point fingerprint, we will get a warning at this point as the group/ungroup sequence has lost the Fingerprint column properties which tell the node how to generate the fingerprint.
In the below part there is the workaround for fingerprint similarity filtering - use a joiner to attach the properties & fingerprints back to the required set of transforms.
Note in this case, a low Tanimoto Similarity threshold is required to get any matching transforms.
Notes
1 - The transform will be applied if any of the rows containing it pass the similarity threshold - although the transform is the same, the environments from the molecules it was created from could (will?) be different
2 - If there are multiple matching sites in a molecule, only those which match the environment similarity threshold will be reacted
3 - If there are multiple matching sites, each site will be reacted in turn, with products only resulting from a single transformation returned
To use this workflow in KNIME, download it from the below URL and open it in KNIME:
Download WorkflowDeploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.