Flare Build Field QSAR

Generates a Flare™ 3D QSAR model for activity from a set of aligned molecules. Training set molecules are used to derive a set of 'sample positions' around the molecules based on their field points, which can be used to probe any molecule for the electrostatic potential or for the volume taken up at those positions. The data matrix derived from sample values is then processed by partial least squares (PLS) to derive an equation that describes activity.

Please refer to the Flare manual for a detailed description of the science behind Field QSAR models in Flare and the corresponding model building options.

The molecules must be pre-aligned - you can use the ‘pyflare’ executable with the ‘align.py’ python script or the 'Flare Align' node to perform the alignment.

This node wraps the 'pyflare' executable, which must be installed with a valid license for this node to work. If this is installed in the default location on Windows, then it should be found automatically. Otherwise, you must either set the 'Cresset Home' preference or the CRESSET_HOME environment variable to the base Cresset software install directory. You may also set the 'pyflare Path' preference or the CRESSET_PYFLARE_EXE environment variable to point directly at the executable itself.

The Flare Build Field QSAR node can be configured to use additional resources to perform calculations. The time taken for the node to run will be drastically reduced using the Cresset Engine Broker™. To use this facility either set the 'Cresset Engine Broker' preference or the CRESSET_BROKER environment variable to point to the location of your local Engine Broker. If you do not currently have the Cresset Engine Broker then contact Cresset (enquiries@cresset-group.com) for pricing on local and cloud based brokers.

For more information visit www.cresset-group.com or contact us at support@cresset-group.com.

Options

Basic

Training Set Structure column
The column that contains the aligned molecules to be used as the training set.
Test Set Structure column
The column that contains the aligned molecules to be used as the test set.
Activity column
The name of the column which specifies the activity data to use when building the model.
Units for the input activity values log-transformations
Specify whether the input activity values require log-transforming and give their units.
Times to repeat the QSAR process with the activities scrambled
Repeats the QSAR model building process with the activities scrambled. More scramble sets provide stronger confirmation of statistical significance but take longer to calculate. The default is to do 50 scramble runs. The results of the scramble can be viewed in the output Flare project using the 'Flare Model Info' node.
Partition Type
Moves a percentage of the molecules from the training set into the test set.
  • None - No molecules are moved to the test set
  • Activity Stratified - Partitions the training set so that a specified percentage of the molecules (e.g. 25%) are moved to the test set. The partitioning is activity-stratified.
  • Random - Partitions the training set so that a specified percentage of the molecules, randomly selected, are moved to the test set.
Note that any molecules read into the test set port are added to those partitioned into the test set by this procedure.
Percentage of input molecules to move to the test set
How much of the training set should be moved to the test set.

Output

Add columns for QSAR descriptors
If checked, then additional columns are added to the output to include the QSAR descriptors capturing the field sample values for each molecule. Note that when this option is turned on, successor nodes cannot be configured until this node is executed.
Flare project format
Specifies the output format of the Flare project.
  • Model only - Creates a Flare project which contains only the model. This option creates a smaller project.
  • Molecules and model - Creates a complete Flare project which includes all the molecules and the model.

Advanced

Weight molecules' contribution by similarity
Weighs the molecules according to their 'Similarity_cresset' tag value, such that similarity values less than the minimum get a weight of 0, and similarity values greater than maximum get a weight of 1. The 'Similarity_cresset' tag value can be set by the 'Flare Align' node.
Weight Mode
Controls how the weight values ramp between 0 and 1. In linear mode, the weight values increase linearly, while in quadratic mode the weight values used are squared.
Minimum similarity
Weighs the molecules according to their 'Similarity_cresset' tag value, such that similarity values less than this value get a weight of 0.
Maximum similarity
Weighs the molecules according to their 'Similarity_cresset' tag value, such that similarity values greater than this value get a weight of 1.
Fields
Specifies which fields to use for the QSAR calculation. At least one field must be selected.
Cross-validation type
Specifies if the leave-one-out or leave-many-out validation method should be used.
  • Leave-one-out: the model is built with a single molecule left out of the process, this is then repeated leaving out each training set molecule in turn. The predicted activity for each molecule is the value obtained when it was left out of the model building process.
  • Leave-many-out: the model is built multiple times (specified by the 'Repeats' option) leaving out a proportion of the data (specified by the 'Training set to use as validation data' option). The predicted activity for each molecule is the average of the predicted activities obtained for each model for which the molecule was left out.
Training set to use as validation data
The percentage of the molecules from the training set to leave out in each iteration during the leave-many-out process.
Repeats
The number of times to run the leave-many-out operation.

Input Ports

Icon
The molecules in the training set which will be used to build the model. All molecules must have activity data and must be pre-aligned.
Icon
Optionally, a test set of molecules can be provided. The molecules must be pre-aligned.

Output Ports

Icon
The input molecules, with tags added for predicted activity and role (training/test set)
Icon
The Flare project containing the generated model. The type of Flare project depends on the ‘Flare project format’ option. The 'Flare Project Viewer' node may be used to view the model. The 'Flare Model Info' node may be used to extract data from the model.
Icon
The sample co-ordinates which were used to generate the model.

Popular Predecessors

  • No recommendations found

Popular Successors

  • No recommendations found

Views

This node has no views

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.