RDKit Structure Normalizer

Checks structures and tries to normalize them, if necessary. Structures that are normalized already will appear in the first output table. Structures, which need to be normalized will be corrected and also put in the first output table. Information about the normalization is made available as bit mask (flags) as well as warning messages. Structures, which cannot be normalized or have been normalized causing a certain warning flag that the user wants to treat as error are put in the second table ("Failed Molecules").

The following flags and messages are currently used:

  • 1 - BAD_MOLECULE, Unable to recognize a molecule (ERROR)
  • 2 - ALIAS_CONVERSION_FAILED, The atom alias conversion failed (ERROR)
  • 4 - TRANSFORMED, Structure has been transformed
  • 8 - FRAGMENTS_FOUND, Multiple fragments have been found
  • 16 - EITHER_WARNING, A wiggly bond has been removed
  • 32 - STEREO_ERROR, Stereo chemistry is ambiguously defined (ERROR)
  • 64 - DUBIOUS_STEREO_REMOVED, A stereo bond has been removed
  • 128 - ATOM_CLASH, There are two atoms or bonds are too close to each other (ERROR)
  • 256 - ATOM_CHECK_FAILED, The atom environment is not correct(ERROR)
  • 512 - SIZE_CHECK_FAILED, The molecule is too big (ERROR)
  • 1024 - RECHARGED, Structure has been recharged
  • 2048 - STEREO_FORCED_BAD, Structure has failed: Bad stereo chemistry (ERROR)
  • 4096 - STEREO_TRANSFORMED, Stereo chemistry has been modified
  • 8192 - TEMPLATE_TRANSFORMED, Structure has been modified using a template

Options

Input - SDF, SMILES or RDKit Mol column
The input column with SDF, SMILES or RDKit Molecules. The latter ones are treated as SDF values. SMILES input will be converted internally into mol blocks before the normalization is done.
Passed Output - Corrected structure column name
The name of the column that will contain the original or corrected structure, in case that any normalization has been applied.
Passed Output - Flags column name
The name of the column that will contain the warning flags. This is a bit mask where each bit has a certain meaning as described above.
Passed Output - Warning messages column name
The name of the column that will contain the warning messages associated with the flags. The "Passed Molecules" table contains only warnings, which are usually associated with a normalization of the input structure.
Failed Output - Flags column name
The name of the column that will contain the error flags. This is a bit mask where each bit has a certain meaning as described above.
Failed Output - Error messages column name
The name of the column that will contain the error messages associated with the flags. The "Failed Molecules" table contains only error that prevented the normalization of an input structure. Additionally, it may contain also warnings that are treated like an error on special request of the user (see Advanced tab).
Logfile Output (Optional) - Selected File
A logfile can be specified here which logs additional output in case of normalizations of structures. It can be used for informal purposes only. There is no mapping in the logfile that would reveal which line belongs to which structures, hence it is only useful if not too many structures are processed.

The "Specify File..." button lets you define your log file in a dialog with the following options:

Write to
Select a file system in which you want to store the file. There are four default file system options to choose from:
  • Local File System: Allows you to select a location in your local system.
  • Mountpoint: Allows you to write to a mountpoint. When selected, a new drop-down menu appears to choose the mountpoint. Unconnected mountpoints are greyed out but can still be selected (note that browsing is disabled in this case). Go to the KNIME Explorer and connect to the mountpoint to enable browsing. A mountpoint is displayed in red if it was previously selected but is no longer available. You won't be able to save the dialog as long as you don't select a valid i.e. known mountpoint.
  • Relative to: Allows you to choose whether to resolve the path relative to the current mountpoint, current workflow or the current workflow's data area. When selected a new drop-down menu appears to choose which of the three options to use.
  • Custom/KNIME URL: Allows to specify a URL (e.g. file://, http:// or knime:// protocol). When selected, a spinner appears that allows you to specify the desired connection and write timeout in milliseconds. In case it takes longer to connect to the host / write the file, the node fails to execute. Browsing is disabled for this option.
It is possible to use other file systems with this node. Therefore, you have to enable the file system connection input port of this node by clicking the ... in the bottom left corner of the node's icon and choose Add File System Connection port.
Afterwards, you can simply connect the desired connector node to this node. The file system connection will then be shown in the drop-down menu. It is greyed out if the file system is not connected in which case you have to (re)execute the connector node first. Note: The default file systems listed above can't be selected if a file system is provided via the input port.

File/URL
Enter a URL when writing to Custom/KNIME URL, otherwise enter a path to a file. The required syntax of a path depends on the chosen file system, such as "C:\path\to\file" (Local File System on Windows) or "/path/to/file" (Local File System on Linux/MacOS and Mountpoint). For file systems connected via input port, the node description of the respective connector node describes the required path format. You can also choose a previously selected file from the drop-down list, or select a location from the "Browse..." dialog. Note that browsing is disabled in some cases:
  • Custom/KNIME URL: Browsing is always disabled.
  • Mountpoint: Browsing is disabled if the selected mountpoint isn't connected. Go to the KNIME Explorer and connect to the mountpoint to enable browsing.
  • File systems provided via input port: Browsing is disabled if the connector node hasn't been executed since the workflow has been opened. (Re)execute the connector node to enable browsing.
The location can be exposed as or automatically set via a path flow variable.

Create missing folders
Select if the folders of the selected output location should be created if they do not already exist. If this option is unchecked, the node will fail if a folder does not exist.

If exists
Specify the behavior of the node in case the output file already exists.
  • Overwrite: Will replace any existing file.
  • Fail: Will issue an error during the node's execution (to prevent unintentional overwrite).

Handling Failures

Special Failures (Optional)
Define here, which warning flags should be treated as errors. If defined as an error they will appear in the second table ("Failed Molecules" table).

Advanced

Transformation Configuration File (.trn) (Optional)
Let's the user define a customized transformation configuration file. The information button shows the configuration that will be used.

The "Load Custom..." button lets you define your own configuration in a dialog with the following options:

Read from:
Select a file system which stores the model you want to read. There are four default file system options to choose from:
  • Local File System: Allows you to select a file from your local system.
  • Mountpoint: Allows you to read from a mountpoint. When selected, a new drop-down menu appears to choose the mountpoint. Unconnected mountpoints are greyed out but can still be selected (note that browsing is disabled in this case). Go to the KNIME Explorer and connect to the mountpoint to enable browsing. A mountpoint is displayed in red if it was previously selected but is no longer available. You won't be able to save the dialog as long as you don't select a valid i.e. known mountpoint.
  • Relative to: Allows you to choose whether to resolve the path relative to the current mountpoint, current workflow or the current workflow's data area. When selected, a new drop-down menu appears to choose which of the two options to use.
  • Custom/KNIME URL: Allows to specify a URL (e.g. file://, http:// or knime:// protocol). When selected, a spinner appears that allows you to specify the desired connection and read timeout in milliseconds. In case it takes longer to connect to the host / read the file, the node fails to execute. Browsing is disabled for this option.
It is possible to use other file systems with this node. Therefore, you have to enable the file system connection input port of this node by clicking the ... in the bottom left corner of the node's icon and choose Add File System Connection port .
Afterwards, you can simply connect the desired connector node to this node. The file system connection will then be shown in the drop-down menu. It is greyed out if the file system is not connected in which case you have to (re)execute the connector node first. Note: The default file systems listed above can't be selected if a file system is provided via the input port.

File/URL:
Enter a URL when reading from Custom/KNIME URL, otherwise enter a path to a file. The required syntax of a path depends on the chosen file system, such as "C:\path\to\file" (Local File System on Windows) or "/path/to/file" (Local File System on Linux/MacOS and Mountpoint). For file systems connected via input port, the node description of the respective connector node describes the required path format. You can also choose a previously selected file from the drop-down list, or select a location from the "Browse..." dialog. Note that browsing is disabled in some cases:
  • Custom/KNIME URL: Browsing is always disabled.
  • Mountpoint: Browsing is disabled if the selected mountpoint isn't connected. Go to the KNIME Explorer and connect to the mountpoint to enable browsing.
  • File systems provided via input port: Browsing is disabled if the connector node hasn't been executed since the workflow has been opened. (Re)execute the connector node to enable browsing.
Warning: Although technically the location can be set via a path flow variable, the filter will always deliver empty results, because the configuration file must be present at configuration time to select for it all active entries for the filter. This step would be missing when using a flow variable, which gets only processed at execution time of the node.
Augmented Atoms Configuration File (.chk) (Optional)
Let's the user define an augmented atoms configuration file. The information button shows the configuration that will be used.

The "Load Custom..." button lets you define your own configuration in a dialog. The dialog options are the same as described in "Transformation Configuration File" section.
Advanced Settings (Optional)
Configure here certain switches that influence how the Structure Normalizer performs its work:
  • cc - Check for collisions (of atoms with other atoms or bonds) (DEFAULT SETTING)
  • cs - Check stereo conventions (DEFAULT SETTING)
  • da - Convert atom text strings to properties
  • dg - Convert ISIS groups to S-Groups
  • ds - Convert CPSS STEXT to data fields
  • dw - Squeeze whitespace out of identifiers
  • dz - Strip most of the trailing zeros
  • tm - Split off minor fragments (and keep only largest one) (DEFAULT SETTING)
Additional options (for advanced users only)
Normally, there is no need to change these settings. However, if you are familiar with the underlying StruChk tool, you may define here manually options that are passed directly to the tool in addition to the specified switches from above. All options must start with a minus, some of them need a subsequent parameter like a file name. File names should be surrounded by quotes. Multiple options must be separated with new line characters.

Input Ports

Icon
Input table with SDF, SMILES or RDKit Molecules
Icon
The file system connection.

Output Ports

Icon
Passed molecules and corrected structures
Icon
Failed molecules and error information

Views

This node has no views

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.