Standardizing Molecular Structures

This workflow snippet shows how to standardize chemical structures in SMILES format using the open-source RDKit nodes.

The steps of standardization and data cleaning comprise
1. the removal of hydrogens
2. the removal of solvents
3. the stripping of salts
4. structure normalization
5. canonicalization

Please note that while we read in the molecules as a KNIME-native table, this is also applicable to data of all kind of formats read in with other readers, e.g. SMILES, SDF or Mol. We remove explicit hydrogens here in the first step for the sake of demonstration, but this is actually done under the hood by any RDKit node. The Salt Stripper node is used twice, once to remove any user-given solvents, and once to remove pre-defined salts. Note that the removal of salts could also be done with the Structure Normalizer node. The canonicalization constitutes the last step in this workflow.

The dataset represents a subset of 844 compounds evaluated for activity against CDPK1. More information is available https://chembl.gitbook.io/chembl-ntd/#deposited-set-19-5th-march-2016-uw-kinase-screening-hits. See Set 19

Standardizing Molecular Structures

Nodes

Extensions

Links

Download