Icon

06_​Find_​Scaffolds_​And_​Sidechains

Finding scaffolds and sidechains

Demonstrates use of RDKit functionality to identify the likely scaffold for a set of compounds from a paper and then determine the sidechains from the molecules matching that scaffold. The results are presented using the RDKit's molecular highlighting functionality.

The dataset used in this example workflow was taken from ChEMBL (https://www.ebi.ac.uk/chembl/).

Requirements:
- RDKit Community Nodes

Read in the molecules from a document and find theMCS Presents an approach to approximate the scaffold for a set of related compounds.The sample dataset is compounds extracted from aChEMBL document. Since these documents are extracted fromthe medchem literature, they tend to contain sets of compounds from a chemical series along with a few referencecompounds. Recognizing this, the scaffold finding approach is to identify the maximum common substructure (MCS)that hits at least 80% of the compounds in the document. In order to allow differences in heteroatom positions in thecore, we do the MCS using generic atom types. After identifying the MCS, it is used to perform an R-group decomposition.Extensions required:RDKit Filter out molecules that don't match thecore and highlight the core on those thatremain Do the R-group decomposition Node 9Node 10Filter out molecules thatdo not match that MCSNode 13Node 14Highlight the atoms matching the coreNode 28Node 33 RDKit R GroupDecomposition Table Rowto Variable RDKit SubstructureFilter File Reader RDKit MCS RDKit MoleculeHighlighting Column Filter Interactive Table Read in the molecules from a document and find theMCS Presents an approach to approximate the scaffold for a set of related compounds.The sample dataset is compounds extracted from aChEMBL document. Since these documents are extracted fromthe medchem literature, they tend to contain sets of compounds from a chemical series along with a few referencecompounds. Recognizing this, the scaffold finding approach is to identify the maximum common substructure (MCS)that hits at least 80% of the compounds in the document. In order to allow differences in heteroatom positions in thecore, we do the MCS using generic atom types. After identifying the MCS, it is used to perform an R-group decomposition.Extensions required:RDKit Filter out molecules that don't match thecore and highlight the core on those thatremain Do the R-group decomposition Node 9Node 10Filter out molecules thatdo not match that MCSNode 13Node 14Highlight the atoms matching the coreNode 28Node 33RDKit R GroupDecomposition Table Rowto Variable RDKit SubstructureFilter File Reader RDKit MCS RDKit MoleculeHighlighting Column Filter Interactive Table

Nodes

Extensions

Links