Flare Build kNN

Generates a Flare™ kNN regression or classification model for activity from a set of aligned molecules.

The molecules must be pre-aligned except when using kNN with 2D descriptors (see below): the ‘pyflare’ executable with the ‘align.py’ python script or the 'Flare Align' node can be used to perform the alignment.

The following types of models can be generated.

k Nearest Neighbor (kNN) regression or classification

The kNN methodology is a well-known and robust machine learning approach where the activity for each compound is predicted as the weighted average activity of its k nearest neighbors (most similar compounds) in the training set.

The similarity between the molecules is calculated using either Cresset's 3D field/shape similarity or by using 2D circular fingerprints (ECFP4, ECFP6, FCFP4, or FCFP6).

These kNN models can be used within the 'Flare Score kNN' node (wrapping the 'pyflare' executable) to predict an activity value for newly designed molecules.

Please refer to the Flare manual for a detailed description of the science behind each of these model types in Flare and the corresponding model building options.

This node wraps the 'pyflare' executable, which must be installed with a valid license for this node to work. If this is installed in the default location on Windows, then it should be found automatically. Otherwise, you must either set the 'Cresset Home' preference or the CRESSET_HOME environment variable to the base Cresset software install directory. You may also set the ‘pyflare Path' preference or the CRESSET_PYFLARE_EXE environment variable to point directly at the executable itself.

The Flare Build kNN node can be configured to use additional resources to perform calculations. The time taken for the node to run will be drastically reduced using the Cresset Engine Broker™. To use this facility either set the 'Cresset Engine Broker' preference or the CRESSET_BROKER environment variable to point to the location of your local Engine Broker. If you do not currently have the Cresset Engine Broker then contact Cresset (enquiries@cresset-group.com) for pricing on local and cloud based brokers.

For more information visit www.cresset-group.com or contact us at support@cresset-group.com.

Options

Basic

Training Set Structure column
The column that contains the aligned molecule to be used as the training set.
Activity column
The name of the column which specifies the activity data to use when building the model.
Units for the input activity values
Specify whether the input activity values require log-transforming and give their units, or whether the activity values are categorical. For categorical data, the activity column should contain only integer values.
Flare project format
Specifies the output format of the Flare project.
  • Model only - Creates a Flare project which only contains the model. This option creates a smaller project.
  • Molecules and model - Creates a complete Flare project which includes all the molecules and the model.

Model Settings

Maximum number of neighbors (k)
The maximum number of neighbors to consider (i.e. the largest value of k).
Similarity matrix method
The method used by kNN to calculate the similarity between the molecules.
  • field - Cresset's field/shape similarity, molecules must be pre-aligned
  • ECFP4 – 2D similarity based on Extended-Connectivity Fingerprints with a radius of 2
  • ECFP6 - 2D similarity based on Extended-Connectivity Fingerprints with a radius of 3
  • FCFP4 - 2D similarity based on Circular Pharmacophore Fingerprints with a radius of 2
  • FCFP6 - 2D similarity based on Circular Pharmacophore Fingerprints with a radius of 3
Shape weight
The relative weight assigned to shape (as opposed to field) similarity. Values must be between 0.0 (all field) and 1.0 (all shape). The default is to use 50% shape / 50% fields.
Optimize pairwise alignments
If checked, the relative orientation of each pair of conformers is optimized by means of a simplex optimizer which rigidly rotates and translates one conformer with respect to the other to maximize the similarity score. Otherwise, the similarity value is computed from fixed input orientations. Turning this option reduces alignment noise, at the expense of increased computational cost/time.
Weighting method
Select the weighting method to use when averaging the activities of the closest neighbors. In 'Automatic' mode, all the weighting options are tried and the one that provides the best q2 value is chosen.

Input Ports

Icon
The molecules in the training set which will be used to build the model. All molecules must have activity data and must be pre-aligned unless you are generating a 2D kNN model.

Output Ports

Icon
The input molecules, with tags added for predicted activity. For kNN models the 'distance to model' and 'activity error' information is also added to the molecules.
Icon
The Flare project containing the generated model. The type of Flare project depends on the ‘Flare project format’ option. The 'Flare Project Viewer' node may be used to view the model. The 'Flare Model Info' node may be used to extract data from the model.

Popular Predecessors

  • No recommendations found

Popular Successors

  • No recommendations found

Views

This node has no views

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.