0 ×

Forge Align

Cresset KNIME Nodes version 1.1.0.18449 by www.cresset-group.com

Forge™ Align is a tool for aligning a set of molecules to one or more reference molecules in a pre-defined conformation.

The alignment is based on molecular fields (electrostatics, van der Waals and hydrophobic): each alignment is then scored using a mixture of field and shape similarity towards the reference molecule(s).

Forge Align molecular comparisons utilise Cresset's 'field point' technology. Field points are a way of representing molecules in terms of their surface and electrostatic properties: positive and negative electrostatic fields, van der Waals and hydrophobic effects on and near the surface of a molecule. Two molecules which both bond to a common active site tend to make similar interactions with the protein and hence have highly similar field point patterns. Forge Align thus aligns molecules such that both their field point patterns and the underlying fields are as similar as possible: this has been shown to correlate extremely well with the bioactive alignments.

Constraints can be set to bias the alignment algorithm and penalize results which do not satisfy the constraint. Three types of constraints can be used with Forge Align:

  1. Field constraints: specify that a particular type of field must be present in the aligned molecule. This could be a hydrophobic point which forces the alignments to fill a particular pocket, or an electrostatic point to enforce an interaction.
  2. Pharmacophore constraints: force aligned molecules to have the chosen feature (for example, H-bond acceptor) at a specific position.
  3. A receptor (protein) molecule can be used as an excluded volume. The protein is not used in a pharmacophoric sense: the alignment is done on the ligand properties only. However, alignments that clash with the protein structure will be penalised.

This node wraps the Forge Align executable 'falign', which must be installed with a valid license for this node to work. If this is installed in the default location on Windows, then it should be found automatically. Otherwise, you must either set the 'Cresset Home' preference setting or the CRESSET_HOME environment variable to the base Cresset software install directory. You may also set the 'falign Path' preference setting or the CRESSET_FORGEALIGN_EXE environment variable to point directly at the executable itself.

The 'Forge Align' node can be configured to use additional resources to perform calculations. The time taken for the node to run will be drastically reduced using the Cresset's Engine Broker. To use this facility either set the "Cresset's Engine Broker" preference setting or set the CRESSET_BROKER environment variable to point to the location of your local Engine Broker. If you do not currently have the Cresset's Engine Broker then contact Cresset (enquiries@cresset-group.com) for pricing on local and cloud based brokers.

For more information visit www.cresset-group.com or contact us at support@cresset-group.com.

Options

Basic

Column containing reference molecule structures
The column in the first input datatable containing the reference molecules.
The reference molecule is a molecule that is used in alignment experiments to fit other molecules. You can have multiple (up to 9) reference molecules however if there is more than one reference, then the references must be correctly aligned to each other, and the other molecules will be aligned to all of the references simultaneously.
Column containing the molecules to be aligned
The column in the second input datatable containing the molecules to be aligned (the moving molecules) to the reference molecules.
Column containing (optional) protein structure
The column in the third input datatable containing an optional protein structure or an alternative source of excluded volume to guide the alignments.
Speed
Speed of operation of Forge Align. Choose from (in order of decreasing speed, but increasing thoroughness): Quick, Normal or Exhaustive. Note that changing this option will alter the values of several other options.
Assign formal charges to the molecules to be aligned
If checked, the protonation states for the molecules to be aligned are set using Cresset's charging rules. Acids will be deprotonated, primary amines protonated, etc.
Align using maximum common substructure
Uses a special conformation hunt where the common substructure with the reference molecule is held in the same conformation as the reference molecule, and groups that are not part of the common substructure are conformation hunted. The different MCS matching rules available allow you to specify how the common substructure routine handles hybridization and element differences.
Only score molecules
Does not generate conformations or allow molecules to move, just score molecules in their input positions. The input molecules must be 3D.

Conformer Hunt

Skip conformation hunt
If checked each record is assumed to be a separate molecule in a fixed conformation. The molecule will be aligned but conformations will not be generated.
Generate at most this number of conformations
The maximum number of conformations to generate for each molecule. Larger values take longer but give better alignments. Values of 50-200 are recommended and a maximum of 2000 can be set.
No. of high-T dynamics runs for flexible rings
Most small rings are handled using a ring conformation library. Conformations for rings that are not found in the library are sampled using high-temperature (~600K) dynamics with energy initially distributed into torsional degrees of freedom. The number of dynamics runs (and hence the degree of ring conformation sampling) is set by this value. Values of 2-10 are recommended. Values above 5 make little difference to flexible rings of fewer than 8 atoms.
Gradient cut-off for conformer minimization
All conformers found are minimized using the XED force field. This option sets the gradient cut-off at which the minimization is terminated. Values that are too small lead to insufficient sampling of conformational space and long run times. Values that are too large can lead to unrealistic structures being generated. Values of 0.1 kcal/mol/A to 1.0 kcal/mol/A are recommended with values at the smaller end of the range being preferred if the 'Include coulombics' option is not checked.
Energy window
Conformations that have a minimized energy that is outside the energy window are discarded. The window is calculated from the lowest energy conformation that has been found. The ideal value for this option depends on the 'Gradient cut-off for conformer minimization' and 'Include coulombics' options. The best results when the 'Include coulombics' option is not checked are obtained by minimizing to a low gradient (0.1 or better) and applying a smaller energy window (3 kcal/mol) but this significantly increases the time for the calculation. Checking the 'Include coulombics' option requires a significantly larger energy window for large molecules (12 kcal/mol) as these can form very low energy collapsed and internally H-bonded structures.
RMS filter for conformation generation
Sets the similarity threshold below which two conformers are deemed identical. This effectively controls the coarseness of the sampling of conformational space. A low value leads to conformations that are only marginally different, while using a large value means that a conformation near the 'correct' one may not be generated. Values of 0.5 A to 1.0 A are recommended: values at the higher end of the range are more appropriate for larger, more flexible molecules.
Acyclic secondary amides handling
Specify how the conformation hunter is to handle amides. Note that this option has no effect on ureas, urethanes, and thioamides as the N-C bonds in these are always treated as rotatable.
  • Force amides trans - forces all secondary amides to adopt the trans geometry.
  • Use input amide geometry - leaves secondary amides in the geometry that they were in the input file and sets them as non-rotatable. As a result, if the input molecule was drawn with a cis amide then only conformations with cis amides will be generated.
  • Allow amides to spin - allows the amide bond to spin, so a mixture of cis and trans amides can be generated.
Include coulombics
If checked, then the conformer generation process uses the full force field, including long-range electrostatics. Better conformer populations are usually generated with this option turned off (unchecked).
Remove Boats
Filter out any conformations that contain 6-membered rings in a boat or twist-boat conformation. By default, they will be filtered out only if sufficiently high in energy that they are removed by the Energy Window setting.

Alignment

Number of alignments to generate
The number of alignments to generate for each molecule. The default is 1.
Score method for multiple references
If there is more than one reference, the default behaviour ("Weighted Average") is to calculate the score of each alignment as the average of the scores to each reference. Set this option to "Maximum" if you want the score for each alignment to be the single highest score to any of the reference molecules. If there is only one reference molecule scoring by "Weighted Average" or "Maximum" give the same results.
Shape weight
The relative weight assigned to shape (as opposed to field) similarity. Values must be between 0.0 (all field) and 1.0 (all shape). On most datasets the default of 0.5 shape similarity gives good results.
Conformers of achiral molecules can be inverted
Allows conformers of achiral molecules to be inverted if that gives a better alignment. Chiral molecules are never inverted. This must be checked if the imported conformations were exported from Forge, as the conformer population filters out mirror image conformations.
Field constraints
Consists of a set of numbers in the form index,size,reference e.g. 16,2.5,1 means that the field point with index 16 on the first reference should have a constraint of 2.5 applied to it You may have more than one field constraint specified, separated by newlines. The reference index is optional and will default to the first reference molecule provided. Please refer to the Forge manual for a detailed explanation of field constraints. For this option to work correctly, the input reference molecule must contain a "_cresset_fieldpoint" tag with the field point data in it. Note that the field points are appended to the atom lists, so if the molecule has 80 atoms, the first field point will have index 81.
Pharmacophore constraints
Consists of a set of numbers in the form index,type,strength,reference e.g. 16,d,3.2,1 means that the pharmacophore constraint on the atom with index 16 should be a donor with strength of 3.2 applied to it. You may have more than one field pharmacophore specified, separated by newlines. The characters for each type are 'd'=Donor, 'a'=Acceptor, '+'=Cation, '-'=Anion, 'm'=Metal binder and 'v'=Covalent. The reference index is optional and will default to the first reference molecule provided. Please refer to the Forge manual for a detailed explanation of pharmacophore constraints.
Protein hardness
  • Soft - A small penalty is applied for each atom of the ligand that overlaps with a protein atom and each protein atom is treated as relatively "squashy". This option works well where you are prepared to accept results that may have some overlap, but you want to remove gross clashes with the protein.
  • Medium - A medium penalty is applied.
  • Hard - A large penalty is applied, and each protein atom is treated as relatively firm. Use this option where you want to remove all results that impinge on the protein structure.
Only has an effect if a protein is specified.
Scoring metric
  • Dice: default similarity metric in the current and previous versions of Forge.
  • Tanimoto: monotonic with Dice, so will not change the rank ordering of results, although the similarity values will change.
  • Tversky: use this metric to set up a more 'substructure-like' or 'superstructure-like' alignment. For a substructure-like alignment (i.e aligning molecules which are substructures of the query), use Tversky with Alpha Value=0.05. For a superstructure-like alignment (i.e. aligning molecules which are larger than but include the query), use Tversky with Alpha Value=0.95.
Alpha Value
Insert a value between 0.0 and 1.0. Only available if Tversky scoring metric is selected
SMARTS Pattern
Request that atoms matching this pattern have a higher weight when present in the MCS.
Do not allow MCS alignment to move
If checked, and if the MCS alignment method was specified, the molecule to be aligned are held fixed with the common substructures overlaid and not allowed to move, even if moving would improve the similarity score.

Advanced

Add column containing log
If checked, the log of the conformer generation and alignment process for each molecule is added as a column 'ForgeAlign_Log'.

Input Ports

Reference molecules. The molecules must be in a defined 3D conformation. These molecules are kept fixed in their input conformations and the other molecules are aligned to them. A maximum of 9 molecules may be used.
The molecules to be aligned to the reference molecules.
Optional protein structure or other source of excluded volume.

Output Ports

The aligned molecules. The reference and protein molecules are not included in the output.

Best Friends (Incoming)

Best Friends (Outgoing)

Installation

To use this node in KNIME, install Cresset KNIME Nodes from the following update site:

KNIME 4.0
Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform.