Forge Build Activity Atlas

Generates an Activity Atlas™ model for activity from a training set of aligned molecules.

Activity Atlas is a probabilistic, qualitative method of analysing the SAR of a set of aligned compounds as a function of their electrostatic, shape and hydrophobic properties. The method uses a Bayesian approach to take a global view of the data in a qualitative manner. Activity Atlas carries out three different analysis of the data:

  • Average of Actives: shows you what the average active molecule looks like, by making an analysis of what the active molecules in the data set have in common.
  • Activity Cliff Summary: shows you the critical regions of the SAR, based on activity cliffs.
  • Regions Explored analysis: makes an assessment of what regions of the aligned molecules have been fully explored and calculates a novelty score for each molecule in the data set. The Regions Explored analysis can be used within the 'Forge Score Activity Atlas' node to compute a 'Novelty' score for new molecules.

The most interesting molecules to make are those with small, controlled changes. Ideas with a low 'Novelty' don't expand our understanding of the SAR, while those with too high a value are potentially taking too bold a leap into the unknown. Designing compounds into the middle ground allows the SAR to be efficiently explored, giving the maximum understanding with the least synthetic effort.

Please refer to the Forge manual for a detailed description of the science behind Activity Atlas models in Forge and the corresponding model building options.

The input molecules must be pre-aligned - the falign program or Forge Align node are ideal for this.

This node wraps the Forge Build executable 'fbuild', which must be installed with a valid license for this node to work. If this is installed in the default location on Windows, then it should be found automatically. Otherwise, you must either set the 'Cresset Home' preference or the CRESSET_HOME environment variable to the base Cresset software install directory. You may also set the 'fbuild Path' preference or the CRESSET_FORGEBUILD_EXE environment variable to point directly at the executable itself.

The Forge Build Activity Atlas node can be configured to use additional resources to perform calculations. The time taken for the node to run will be drastically reduced using the Cresset's Engine Broker. To use this facility either set the 'Cresset Engine Broker' preference or the CRESSET_BROKER environment variable to point to the location of your local Engine Broker. If you do not currently have the Cresset Engine Broker then contact Cresset (enquiries@cresset-group.com) for pricing on local and cloud based brokers.

For more information visit www.cresset-group.com or contact us at support@cresset-group.com.

Options

Basic

Training Set Structure column
The column that contains the aligned molecules to be used as the training set.
Activity column
The name of the column which specifies the activity data to use when building the model.
Units for the input activity values log-transformations
Specify whether the input activity values require log-transforming and give their units.
Similarity column
The name of the column which specifies the similarity data to use when building the model. The values of this column denote how trustworthy the molecule alignment is where 1.0 denotes the molecule alignment is trustworthy while 0.0 denotes the alignment is untrustworthy. If blank then the contents of the 'Similarity_cresset' will be used if available. Otherwise all molecule alignments will be treated as equally trustworthy.
Assign formal charges to input molecules
If set, the protonation states for the input molecules will be set using Cresset's charging rules. Acids will be deprotonated, primary amines protonated, etc.

Activity Atlas

Algorithm
The Activity Atlas algorithm to use. The choice is between the Weighted Sum algorithm, less susceptible to outliers, and the Sum algorithm, the original implementation of the Activity Atlas algorithm.
Grid Spacing
The grid size to use for the analysis. A smaller grid gives finer details, but at the expense of longer calculation time.
Automatically calculate the disparity range
Automatically calculate the disparity range based on the input molecules
Minimum disparity/Maximum disparity
Molecule pairs whose disparity is less than the minimum value will be excluded from the analysis. Pairs whose disparity is greater than the maximum will be treated as though they had the maximum value.
For example, if the range is 5.0-20.0, then pairs with a disparity less than 5.0 are ignored, and those with a disparity of greater than 20.0 treated as though the disparity value was 20.0.
Automatically calculate the activity range
Automatically calculate the activity range based on the input molecules
Inactive if activity below/Fully active if activity above
Molecules whose activity is less than the minimum value will be treated as 'inactive'. Those with more than the maximum value are 'fully active'.
For example, if the range is 6.0-8.0, then any molecule with an activity of less than 6.0 is inactive, any molecule with an activity of more than 8.0 is fully active, and any molecule in between is partially active.
Automatically calculate the similarity range
Automatically calculate the similarity range based on the input molecules
Alignments not trusted if similarity below/Alignments fully trusted if similarity above
Set the similarity thresholds to use when deciding whether a molecule is correctly aligned. Alignments whose similarity value is less than the lower threshold are not trusted at all and are excluded from the calculation. Those with similarity values above the upper threshold are completely trusted and are assumed to be correct. The ones in between are partially trusted.
For example, if the range is 0.6-0.8, then any alignment with a similarity score less than 0.6 will be excluded. Alignments with a similarity of 0.8 or higher are assumed to be correct. An alignment with a similarity score of 0.7 (half way between the thresholds) is assumed to have a 50% chance of being correct.
Molecules required to fully explore a region
Defines the number of molecules whose fields must be seen in a 3D region of space before that region is considered fully explored.
Shape weight
The relative weight assigned to shape (as opposed to field) similarity. Values must be between 0.0 (all field) and 1.0 (all shape).
Optimize pairwise alignments
If checked, the relative orientation of each pair of conformers is optimized by means of a simplex optimizer which rigidly rotates and translates one conformer with respect to maximize the similarity score. Otherwise, the similarity value is computed from fixed input orientations. Turning this option on reduces alignment noise, at the expense of computational cost/time.

Output

Forge project format
Specifies the output format of the Forge project.
  • Model only - Creates a Forge project which only contains the model. This option creates a smaller project.
  • Molecules and model - Creates a complete Forge project which includes all the molecules and the model.
Write surfaces to Directory
Check this to write the surfaces to a directory.
Output format
Choose the output format for the surfaces:
  • Cube
  • CCP4
  • Insight
  • Moe
Selected Directory
Select the output directory for the surfaces.
Overwrite Directory
Check this to overwrite the directory if already present.

Input Ports

Icon
The molecules in the training set which will be used to build the model. All molecules must have activity data and must be pre-aligned.

Output Ports

Icon
The input molecules with the novelty column added
Icon
The Forge project containing the generated model. The type of Forge project depends on the Forge project format option. The 'Forge Project Viewer' node may be used to view the model. The 'Forge Model Info' node may be used to extract data from the model.

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.