0 ×

RDKit Structure Normalizer

RDKit Nodes for Knime version 3.8.0.v201906261723 by NIBR

Checks structures and tries to normalize them, if necessary. Structures that are normalized already will appear in the first output table. Structures, which need to be normalized will be corrected and also put in the first output table. Information about the normalization is made available as bit mask (flags) as well as warning messages. Structures, which cannot be normalized or have been normalized causing a certain warning flag that the user wants to treat as error are put in the second table ("Failed Molecules").

The following flags and messages are currently used:

  • 1 - BAD_MOLECULE, Unable to recognize a molecule (ERROR)
  • 2 - ALIAS_CONVERSION_FAILED, The atom alias conversion failed (ERROR)
  • 4 - TRANSFORMED, Structure has been transformed
  • 8 - FRAGMENTS_FOUND, Multiple fragments have been found
  • 16 - EITHER_WARNING, A wiggly bond has been removed
  • 32 - STEREO_ERROR, Stereo chemistry is ambiguously defined (ERROR)
  • 64 - DUBIOUS_STEREO_REMOVED, A stereo bond has been removed
  • 128 - ATOM_CLASH, There are two atoms or bonds are too close to each other (ERROR)
  • 256 - ATOM_CHECK_FAILED, The atom environment is not correct(ERROR)
  • 512 - SIZE_CHECK_FAILED, The molecule is too big (ERROR)
  • 1024 - RECHARGED, Structure has been recharged
  • 2048 - STEREO_FORCED_BAD, Structure has failed: Bad stereo chemistry (ERROR)
  • 4096 - STEREO_TRANSFORMED, Stereo chemistry has been modified
  • 8192 - TEMPLATE_TRANSFORMED, Structure has been modified using a template

Options

Input - SDF, SMILES or RDKit Mol column
The input column with SDF, SMILES or RDKit Molecules. The latter ones are treated as SDF values. SMILES input will be converted internally into mol blocks before the normalization is done.
Passed Output - Corrected structure column name
The name of the column that will contain the original or corrected structure, in case that any normalization has been applied.
Passed Output - Flags column name
The name of the column that will contain the warning flags. This is a bit mask where each bit has a certain meaning as described above.
Passed Output - Warning messages column name
The name of the column that will contain the warning messages associated with the flags. The "Passed Molecules" table contains only warnings, which are usually associated with a normalization of the input structure.
Failed Output - Flags column name
The name of the column that will contain the error flags. This is a bit mask where each bit has a certain meaning as described above.
Failed Output - Error messages column name
The name of the column that will contain the error messages associated with the flags. The "Failed Molecules" table contains only error that prevented the normalization of an input structure. Additionally, it may contain also warnings that are treated like an error on special request of the user (see Advanced tab).
Logfile Output (Optional) - Selected File
A logfile can be specified here which logs additional output in case of normalizations of structures. It can be used for informal purposes only. There is no mapping in the logfile that would reveal which line belongs to which structures, hence it is only useful if not too many structures are processed.
Logfile Output (Optional) - Overwrite if file exists
Set this flag to allow overwriting of an existing logfile.

Handling Failures

Special Failures (Optional)
Define here, which warning flags should be treated as errors. If defined as an error they will appear in the second table ("Failed Molecules" table).

Advanced

Transformation Configuration File (.trn) (Optional)
Let's the user define a customized transformation configuration file. The information button shows the configuration that will be used.
Augmented Atoms Configuration File (.chk) (Optional)
Let's the user define a augmented atoms configuration file. The information button shows the configuration that will be used.
Advanced Settings (Optional)
Configure here certain switches that influence how the Structure Normalizer performs its work:
  • cc - Check for collisions (of atoms with other atoms or bonds) (DEFAULT SETTING)
  • cs - Check stereo conventions (DEFAULT SETTING)
  • da - Convert atom text strings to properties
  • dg - Convert ISIS groups to S-Groups
  • ds - Convert CPSS STEXT to data fields
  • dw - Squeeze whitespace out of identifiers
  • dz - Strip most of the trailing zeros
  • tm - Split off minor fragments (and keep only largest one) (DEFAULT SETTING)
Additional options (for advanced users only)
Normally, there is no need to change these settings. However, if you are familiar with the underlying StruChk tool, you may define here manually options that are passed directly to the tool in addition to the specified switches from above. All options must start with a minus, some of them need a subsequent parameter like a file name. File names should be surrounded by quotes. Multiple options must be separated with new line characters.

Input Ports

Input table with SDF, SMILES or RDKit Molecules

Output Ports

Passed molecules and corrected structures
Failed molecules and error information

Best Friends (Incoming)

Best Friends (Outgoing)

Installation

To use this node in KNIME, install RDKit KNIME JUnit Test from the following update site:

KNIME 4.0
Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform.

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.