0 ×

Forge Build Activity Atlas

Cresset KNIME Nodes version 2.1.0.20539 by www.cresset-group.com

Generates an Activity Atlas model for activity from a training set of aligned molecules.

Activity Atlas is a probabilistic, qualitative method of analysing the SAR of a set of aligned compounds as a function of their electrostatic and shape properties. The method uses a Bayesian approach to take a global view of the data in a qualitative manner. Activity Atlas carries out three different analysis of the data:

  • Average of Actives: shows you what the average active molecule looks like, by making an analysis of what have in common the active molecules in the data set.
  • Activity Cliff Summary: shows you the critical regions of the SAR, based on activity cliffs.
  • Regions explored analysis: makes an assessment of what regions of the aligned molecules have been fully explored, and calculates a novelty score for each molecule in the data set. The Regions explored analysis can be used within the "Forge Score Activity Atlas" node to compute a 'Novelty' score for new molecules.

Please refer to the Forge manual for a detailed description of the science behind Activity Atlas models in Forge and the corresponding model building options.

The "Regions explored" analysis allows the "Forge Score Activity Atlas" node to compute an 'Novelty' score for new molecules - if we made a new molecule, what would it tell us? The molecules with the highest 'Novelty' are those with small, controlled changes. Ideas with a low 'Novelty' don't expand our understanding of the SAR, while those with too high a value are potentially taking too bold a leap into the unknown. Designing compounds into the middle ground allows the SAR to be efficiently explored, giving the maximum understanding with the least synthetic effort.

The input molecules must be pre-aligned - the falign program or Forge Align node is ideal for this.

This node wraps the Forge Build executable 'fbuild', which must be installed with a valid license for this node to work. If this is installed in the default location on Windows, then it should be found automatically. Otherwise, you must either set the "Cresset Home" preference or the CRESSET_HOME environment variable to the base Cresset software install directory. You may also set the "fbuild Path" preference or the CRESSET_FORGEBUILD_EXE environment variable to point directly at the executable itself.

The Forge Build Activity Atlas node can be configured to use additional resources to perform calculations. The time taken for the node to run will be drastically reduced using the Cresset's Engine Broker. To use this facility either set the "Cresset Engine Broker" preference or the CRESSET_BROKER environment variable to point to the location of your local Engine Broker. If you do not currently have the Cresset Engine Broker then contact Cresset (enquiries@cresset-group.com) for pricing on local and cloud based brokers.

For more information visit www.cresset-group.com or contact us at support@cresset-group.com.

Options

Basic

Training Set Structure column
The column that contains the aligned molecules to be used as the training set.
Activity column
The name of the column which specifies the activity data to use when building the model.
Units for the input activity values log-transformations
Specify whether the input activity values require log-transforming, and give their units.
Assign formal charges to moving molecules
If set, protonation states for the molecules are set using Cresset's charging rules. Acids will be deprotonated, primary amines protonated, etc.

Activity Atlas

Grid Spacing
The grid size to use for the analysis. A smaller grid gives finer details, but at the expense of longer calculation time.
Automatically calculate the disparity range
Automatically calculate the disparity range based on the input molecules
Minimum disparity/Maximum disparity
Molecule pairs whose disparity is less than the minimum value will be excluded from the analysis. Pairs whose disparity is greater than the maximum will be treated as though they had the maximum value.
For example, if the range is 5.0-20.0, then pairs with a disparity less than 5.0 are ignored, and those with a disparity of greater than 20.0 treated as though the disparity value was 20.0.
Automatically calculate the activity range
Automatically calculate the activity range based on the input molecules
Inactive if activity below/Fully active if activity above
Molecules whose activity is less than the minimum value will be treated as 'inactive'. Those with more than the maximum value are 'fully active'.
For example, if the range is 6.0-8.0, then any molecule with an activity of less than 6.0 is inactive, any molecule with an activity of more than 8.0 is fully active, and any molecule in between is partially active.
Automatically calculate the similarity range
Automatically calculate the similarity range based on the input molecules
Alignments not trusted if similarity below/Alignments fully trusted if similarity above
Set the similarity thresholds to use when deciding whether a molecule is correctly aligned. Alignments whose similarity value is less than the lower threshold are not trusted at all and are excluded from the calculation. Those with similarity values above the upper threshold are completely trusted and are assumed to be correct. The ones in between are partially trusted.
For example, if the range is 0.6-0.8, then any alignment with a similarity score less than 0.6 will be excluded. Alignments with a similarity of 0.8 or higher are assumed to be correct. An alignment with a similarity score of 0.7 (half way between the thresholds) is assumed to have a 50% chance of being correct.
Molecules required to fully explore a region
Defines the number of molecules whose fields need to be seen in region of 3D space before that region is consider to be fully explored.
Shape weight
The relative weight assigned to shape (as opposed to field) similarity. Values must be between 0.0 (all field) and 1.0 (all shape).
Optimize pairwise alignments
If checked, then each pair of conformer will be individually optimized from the starting position to maximise its score. Otherwise, the similarity value is just computed from the fixed input orientations. Turning this option on reduces alignment noise but slows the calculation by a factor of 10 or so

Output

Forge project format
Specifies the output format of the Forge project.
  • Model only - Creates a Forge project which only contains the model. This option creates a smaller project.
  • Molecules and model - Creates a complete Forge project which includes all the molecules and the model.

Input Ports

The molecules in the training set which will be used to build the model. All molecules must have activity data and must be pre-aligned.

Output Ports

The input molecules with the novelty column added
The Forge project containing the generated model. The type of Forge project depends on the Forge project format option. The "Forge Project Viewer" node may be used to view the model. The "Forge Model Info" node may be used to extract data from the model.

Best Friends (Incoming)

Workflows

Update Site

To use this node in KNIME, install Cresset KNIME Nodes from the following update site:

Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform.