Icon

Standardizing Molecular Structures

This workflow snippet shows how to standardize chemical structures in SMILES format using the open-source RDKit nodes.

The steps of standardization and data cleaning comprise
1. the removal of hydrogens
2. the removal of solvents
3. the stripping of salts
4. structure normalization
5. canonicalization

Please note that while we read in the molecules as a KNIME-native table, this is also applicable to data of all kind of formats read in with other readers, e.g. SMILES, SDF or Mol. We remove explicit hydrogens here in the first step for the sake of demonstration, but this is actually done under the hood by any RDKit node. The Salt Stripper node is used twice, once to remove any user-given solvents, and once to remove pre-defined salts. Note that the removal of salts could also be done with the Structure Normalizer node. The canonicalization constitutes the last step in this workflow.

The dataset represents a subset of 844 compounds evaluated for activity against CDPK1. More information is available https://chembl.gitbook.io/chembl-ntd/#deposited-set-19-5th-march-2016-uw-kinase-screening-hits. See Set 19

Removing explicit hydrogens.Note that this step could beommited, as RDKit does thatautomatically Use the salt stripper node toremove any user-definedsolvents Using the pre-defined salt tablesupplied by the node to removeany salts from the structures This workflow snippet shows how to standardize chemical structures in SMILES format using the open-source RDKit nodes. The steps of standardization and data cleaning comprise 1. the removal of hydrogens2. the removal of solvents3. the stripping of salts4. structure normalization5. canonicalizationThe dataset represents a subset of 844 compounds evaluated for activity against CDPK1 (https://chembl.gitbook.io/chembl-ntd/#deposited-set-19-5th-march-2016-uw-kinase-screening-hits, see Set 19).For more information about the steps, please check the workflow description. See the configurations in the "Advanced" tab for thenormalizations that this node does. Note that itcould also do salt stripping ("Split off minorfragments"), but we decided to do that with thededicated node to have more control over it. Canonicalizing the resulting molecules Remove hydrogensStrip saltsRead in SMILESRemove SolventsDefine solventsRDKit Remove Hs RDKit Salt Stripper Table Reader RDKit Salt Stripper Table Creator RDKit Canon SMILES RDKit StructureNormalizer Removing explicit hydrogens.Note that this step could beommited, as RDKit does thatautomatically Use the salt stripper node toremove any user-definedsolvents Using the pre-defined salt tablesupplied by the node to removeany salts from the structures This workflow snippet shows how to standardize chemical structures in SMILES format using the open-source RDKit nodes. The steps of standardization and data cleaning comprise 1. the removal of hydrogens2. the removal of solvents3. the stripping of salts4. structure normalization5. canonicalizationThe dataset represents a subset of 844 compounds evaluated for activity against CDPK1 (https://chembl.gitbook.io/chembl-ntd/#deposited-set-19-5th-march-2016-uw-kinase-screening-hits, see Set 19).For more information about the steps, please check the workflow description. See the configurations in the "Advanced" tab for thenormalizations that this node does. Note that itcould also do salt stripping ("Split off minorfragments"), but we decided to do that with thededicated node to have more control over it. Canonicalizing the resulting molecules Remove hydrogensStrip saltsRead in SMILESRemove SolventsDefine solventsRDKit Remove Hs RDKit Salt Stripper Table Reader RDKit Salt Stripper Table Creator RDKit Canon SMILES RDKit StructureNormalizer

Nodes

Extensions

Links