Forge Build Field QSAR

Generates a Forge™ 3D QSAR model for activity from a set of aligned molecules. Training set molecules are used to derive a set of 'sample positions' around the molecules based on their field points, which can be used to probe any molecule for the electrostatic potential or for the volume taken up at those positions. The data matrix derived from sample values is then processed by partial least squares (PLS) to derive an equation that describes activity.

Please refer to the Forge manual for a detailed description of the science behind Field QSAR models in Forge and the corresponding model building options.

The molecules must be pre-aligned - the falign program or the 'Forge Align' node are ideal for this.

This node wraps the Forge Build executable 'fbuild', which must be installed with a valid license for this node to work. If this is installed in the default location on Windows, then it should be found automatically. Otherwise, you must either set the 'Cresset Home' preference or the CRESSET_HOME environment variable to the base Cresset software install directory. You may also set the 'fbuild Path' preference or the CRESSET_FORGEBUILD_EXE environment variable to point directly at the executable itself.

The Forge Build Field QSAR node can be configured to use additional resources to perform calculations. The time taken for the node to run will be drastically reduced using the Cresset's Engine Broker. To use this facility either set the 'Cresset Engine Broker' preference or the CRESSET_BROKER environment variable to point to the location of your local Engine Broker. If you do not currently have the Cresset Engine Broker then contact Cresset (enquiries@cresset-group.com) for pricing on local and cloud based brokers.

For more information visit www.cresset-group.com or contact us at support@cresset-group.com.

Options

Basic

Training Set Structure column

The column that contains the aligned molecules to be used as the training set.

Test Set Structure column

The column that contains the aligned molecules to be used as the test set.

Activity column

The name of the column which specifies the activity data to use when building the model.

Units for the input activity values log-transformations

Specify whether the input activity values require log-transforming and give their units.

Assign formal charges to input molecules

If set, the protonation states for the input molecules will be set using Cresset's charging rules. Acids will be deprotonated, primary amines protonated, etc.

Times to repeat the QSAR process with the activities scrambled

Repeats the QSAR model building process with the activities scrambled. More scramble sets provide stronger confirmation of statistical significance but take longer to calculate. The default is to do 50 scramble runs. The results of the scramble can be viewed in the output Forge project using the 'Forge model info' node.

Partition Type

Moves a percentage of the molecules from the training set into the test set.

None - No molecules are moved to the test set
Activity Stratified - Partitions the training set so that a specified percentage of the molecules (e.g. 25%) are moved to the test set. The partitioning is activity-stratified.
Random - Partitions the training set so that a specified percentage of the molecules, randomly selected, are moved to the test set.

Note that any molecules read into the test set port are added to those partitioned into the test set by this procedure.

Percentage of input molecules to move to the test set

How much of the training set should be moved to the test set.

Output

Add columns for QSAR descriptors

If checked, then additional columns are added to the output to include the QSAR descriptors capturing the field sample values for each molecule. Note that when this option is turned on, successor nodes cannot be configured until this node is executed.

Forge project format

Specifies the output format of the Forge project.

Model only - Creates a Forge project which contains only the model. This option creates a smaller project.
Molecules and model - Creates a complete Forge project which includes all the molecules and the model.

Advanced

Weight molecules' contribution by similarity

Weighs the molecules according to their 'Similarity_cresset' tag value, such that similarity values less than the minimum get a weight of 0, and similarity values greater than maximum get a weight of 1. The 'Similarity_cresset' tag value can be set by the 'Forge Align' node.

Weight Mode

Controls how the weight values ramp between 0 and 1. In linear mode, the weight values increase linearly, while in quadratic mode the weight values used are squared.

Minimum similarity

Weighs the molecules according to their 'Similarity_cresset' tag value, such that similarity values less than this value get a weight of 0.

Maximum similarity

Weighs the molecules according to their 'Similarity_cresset' tag value, such that similarity values greater than this value get a weight of 1.

Fields

Specifies which fields to use for the QSAR calculation. At least one field must be selected.

Cross-validation type

Specifies if the leave-one-out or leave-many-out validation method should be used.

Leave-one-out: the model is built with a single molecule left out of the process, this is then repeated leaving out each training set molecule in turn. The predicted activity for each molecule is the value obtained when it was left out of the model building process.
Leave-many-out: the model is built multiple times (specified by the 'Repeats' option) leaving out a proportion of the data (specified by the 'Training set to use as validation data' option). The predicted activity for each molecule is the average of the predicted activities obtained for each model for which the molecule was left out.

Training set to use as validation data

The percentage of the molecules from the training set to leave out in each iteration during the leave-many-out process.

Repeats

The number of times to run the leave-many-out operation.

Input Ports

: The molecules in the training set which will be used to build the model. All molecules must have activity data and must be pre-aligned.
: Optionally, a test set of molecules can be provided. The molecules must be pre-aligned.

Output Ports

: The input molecules, with tags added for predicted activity and role (training/test set)
: The Forge project containing the generated model. The type of Forge project depends on the Forge project format option. The 'Forge Project Viewer' node may be used to view the model. The 'Forge Model Info' node may be used to extract data from the model.
: The sample co-ordinates which were used to generate the model.

Popular Predecessors

Popular Successors

Views

This node has no views

Workflows

Create a Field QSAR model using Forge BuildCresset

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension Cresset KNIME Nodes from the below update site following our NodePit Product and Node Installation Guide:

v5.4

A zipped version of the software site can be downloaded here.

Plugin provider: www.cresset-group.com

Plugin version: 3.0.0.250226

On NodePit since: 2024-12-06

Last update: 2025-05-31

KNIME versions: Since v3.6

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!