Forge Build k Nearest Neighbor (kNN)

This Node Is Deprecated — This node is kept for backwards-compatibility, but the usage in new workflows is no longer recommended. The documentation below might contain more information.

Generates a k Nearest Neighbor (kNN) regression model using similarity and activity data of aligned molecules.
This node has been deprecated and replaced by the Forge Build Machine Learning node.

The kNN methodology is a well-known and robust distance learning approach where the activity for each new compound is predicted as the weighted average activity of its k nearest neighbors in the training set.

The similarity between the molecules is calculated using Cresset's field/shape similarity method or by using the 2D circular fingerprint methods ECFP4, ECFP6, FCFP4, or FCFP6. This model can be used within the "Forge Score k Nearest Neighbor (kNN)" node with fscore to predict an activity value for newly designed molecules.

Please refer to the Forge manual for a detailed description of the science behind kNN models in Forge and the corresponding model building options.

The molecules must be pre-aligned if Cresset's field/shape similarity method is being used - the falign program or Forge Align node are ideal for this.

This node wraps the Forge Build executable 'fbuild', which must be installed with a valid license for this node to work. If this is installed in the default location on Windows, then it should be found automatically. Otherwise, you must either set the 'Cresset Home' preference or the CRESSET_HOME environment variable to the base Cresset software install directory. You may also set the 'fbuild Path' preference or the CRESSET_FORGEBUILD_EXE environment variable to point directly at the executable itself.

The Forge Build k Nearest Neighbor (kNN) node can be configured to use additional resources to perform calculations. The time taken for the node to run will be drastically reduced using the Cresset's Engine Broker. To use this facility either set the 'Cresset Engine Broker' preference or the CRESSET_BROKER environment variable to point to the location of your local Engine Broker. If you do not currently have the Cresset Engine Broker then contact Cresset (enquiries@cresset-group.com) for pricing on local and cloud based brokers.

For more information visit www.cresset-group.com or contact us at support@cresset-group.com.

Options

Basic

Training Set Structure column
The column that contains the aligned molecule to be used as the training set.
Activity column
The name of the column which specifies the activity data to use when building the model.
Units for the input activity values log-transformations
Specify whether the input activity values require log-transforming and give their units.
Assign formal charges to input molecules
If set, the protonation states for the input molecules will be set using Cresset's charging rules. Acids will be deprotonated, primary amines protonated, etc.
Maximum number of neighbors (k)
The maximum number of neighbors to consider (i.e. the largest value of k).
Similarity matrix method
The method used to calculate the similarity between the molecules.
  • field - Cresset's field/shape similarity, molecules must be pre-aligned
  • ECFP4 – 2D similarity based on Extended-Connectivity Fingerprints with a radius of 2
  • ECFP6 - 2D similarity based on Extended-Connectivity Fingerprints with a radius of 3
  • FCFP4 - 2D similarity based on Circular Pharmacophore Fingerprints with a radius of 2
  • FCFP6 - 2D similarity based on Circular Pharmacophore Fingerprints with a radius of 3
Shape weight
The relative weight assigned to shape (as opposed to field) similarity. Values must be between 0.0 (all field) and 1.0 (all shape).
Optimize pairwise alignments
If checked, then each pair of conformers is individually optimized from the starting position to maximise its score. Otherwise, the similarity value is just computed from the fixed input orientations. Turning this option on reduces alignment noise but slows the calculation by a factor of 10 or so.
Weighting method
Select the weighting method to use

Output

Forge project format
Specifies the output format of the Forge project.
  • Model only - Creates a Forge project which only contains the model. This option creates a smaller project.
  • Molecules and model - Creates a complete Forge project which includes all the molecules and the model.

Input Ports

Icon
The molecules in the training set which will be used to build the model. All molecules must have activity data and must be pre-aligned if you are using Cresset field/shape similarity.

Output Ports

Icon
The input molecules, with tags added for predicted activity
Icon
The Forge project containing the generated model. The type of Forge project depends on the Forge project format option. The 'Forge Project Viewer' node may be used to view the model. The 'Forge Model Info' node may be used to extract data from the model.

Views

This node has no views

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.