0 ×

Forge Build Field QSAR

Cresset KNIME Nodes version 2.5.0.36566 by www.cresset-group.com

Generates a Forge™ 3D QSAR model for activity from a set of aligned molecules. Training set molecules are used to derive a set of 'sample positions' around the molecules based on their field points, which can be used to probe any molecule for the electrostatic potential or for the volume taken up at those positions. The data matrix derived from sample values is then processed by partial least squares (PLS) to derive an equation that describes activity.

Please refer to the Forge manual for a detailed description of the science behind Field QSAR models in Forge and the corresponding model building options.

The molecules must be pre-aligned - the falign program or the 'Forge Align' node are ideal for this.

This node wraps the Forge Build executable 'fbuild', which must be installed with a valid license for this node to work. If this is installed in the default location on Windows, then it should be found automatically. Otherwise, you must either set the 'Cresset Home' preference or the CRESSET_HOME environment variable to the base Cresset software install directory. You may also set the 'fbuild Path' preference or the CRESSET_FORGEBUILD_EXE environment variable to point directly at the executable itself.

The Forge Build Field QSAR node can be configured to use additional resources to perform calculations. The time taken for the node to run will be drastically reduced using the Cresset's Engine Broker. To use this facility either set the 'Cresset Engine Broker' preference or the CRESSET_BROKER environment variable to point to the location of your local Engine Broker. If you do not currently have the Cresset Engine Broker then contact Cresset (enquiries@cresset-group.com) for pricing on local and cloud based brokers.

For more information visit www.cresset-group.com or contact us at support@cresset-group.com.

Options

Basic

Training Set Structure column
The column that contains the aligned molecules to be used as the training set.
Test Set Structure column
The column that contains the aligned molecules to be used as the test set.
Activity column
The name of the column which specifies the activity data to use when building the model.
Units for the input activity values log-transformations
Specify whether the input activity values require log-transforming and give their units.
Assign formal charges to input molecules
If set, the protonation states for the input molecules will be set using Cresset's charging rules. Acids will be deprotonated, primary amines protonated, etc.
Times to repeat the QSAR process with the activities scrambled
Repeats the QSAR model building process with the activities scrambled. More scramble sets provide stronger confirmation of statistical significance but take longer to calculate. The default is to do 50 scramble runs. The results of the scramble can be viewed in the output Forge project using the 'Forge model info' node.
Partition Type
Moves a percentage of the molecules from the training set into the test set.
  • None - No molecules are moved to the test set
  • Activity Stratified - Partitions the training set so that a specified percentage of the molecules (e.g. 25%) are moved to the test set. The partitioning is activity-stratified.
  • Random - Partitions the training set so that a specified percentage of the molecules, randomly selected, are moved to the test set.
Note that any molecules read into the test set port are added to those partitioned into the test set by this procedure.
Percentage of input molecules to move to the test set
How much of the training set should be moved to the test set.

Output

Add columns for QSAR descriptors
If checked, then additional columns are added to the output to include the QSAR descriptors capturing the field sample values for each molecule. Note that when this option is turned on, successor nodes cannot be configured until this node is executed.
Forge project format
Specifies the output format of the Forge project.
  • Model only - Creates a Forge project which contains only the model. This option creates a smaller project.
  • Molecules and model - Creates a complete Forge project which includes all the molecules and the model.

Advanced

Weight molecules' contribution by similarity
Weighs the molecules according to their 'Similarity_cresset' tag value, such that similarity values less than the minimum get a weight of 0, and similarity values greater than maximum get a weight of 1. The 'Similarity_cresset' tag value can be set by the 'Forge Align' node.
Weight Mode
Controls how the weight values ramp between 0 and 1. In linear mode, the weight values increase linearly, while in quadratic mode the weight values used are squared.
Minimum similarity
Weighs the molecules according to their 'Similarity_cresset' tag value, such that similarity values less than this value get a weight of 0.
Maximum similarity
Weighs the molecules according to their 'Similarity_cresset' tag value, such that similarity values greater than this value get a weight of 1.
Fields
Specifies which fields to use for the QSAR calculation. At least one field must be selected.
Cross-validation type
Specifies if the leave-one-out or leave-many-out validation method should be used.
  • Leave-one-out: the model is built with a single molecule left out of the process, this is then repeated leaving out each training set molecule in turn. The predicted activity for each molecule is the value obtained when it was left out of the model building process.
  • Leave-many-out: the model is built multiple times (specified by the 'Repeats' option) leaving out a proportion of the data (specified by the 'Training set to use as validation data' option). The predicted activity for each molecule is the average of the predicted activities obtained for each model for which the molecule was left out.
Training set to use as validation data
The percentage of the molecules from the training set to leave out in each iteration during the leave-many-out process.
Repeats
The number of times to run the leave-many-out operation.

Input Ports

Icon
The molecules in the training set which will be used to build the model. All molecules must have activity data and must be pre-aligned.
Icon
Optionally, a test set of molecules can be provided. The molecules must be pre-aligned.

Output Ports

Icon
The input molecules, with tags added for predicted activity and role (training/test set)
Icon
The Forge project containing the generated model. The type of Forge project depends on the Forge project format option. The 'Forge Project Viewer' node may be used to view the model. The 'Forge Model Info' node may be used to extract data from the model.
Icon
The sample co-ordinates which were used to generate the model.

Best Friends (Incoming)

Best Friends (Outgoing)

Workflows

Installation

To use this node in KNIME, install Cresset KNIME Nodes from the following update site:

KNIME 4.2

You don't know what to do with this link? Read our NodePit Product and Node Installation Guide that explains you in detail how to install nodes to your KNIME Analytics Platform.

Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform. Browse NodePit from within KNIME, install nodes with just one click and share your workflows with NodePit Space.

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.