Spark Database Search

Spark™ is a bioisostere replacement tool which suggests biologically relevant replacements (bioisosteres) for key fragments of known active molecules. You load an active molecule into Spark, preferably in the bioactive conformation, select the part of the active that you wish to replace and specify which databases to search. Spark will present a list of biologically relevant replacements ranked using Cresset’s unique molecular field technology (https://www.cresset-group.com/science/), or using Lead Finder™’s docking score (https://www.cresset-group.com/software/lead-finder/). As well as replacing central parts of a molecule, Spark can suggest replacements for terminal groups. It can be used to grow ligands and fragments into unoccupied pockets of the target protein, carry out ligand joining and macrocyclization experiment, and to find a fragment which displaces a crystallographic water molecule near your ligand. Spark comes with a set of databases of fragments generated from whole molecules (e.g. commercially available or literature reported compounds) or from synthetic reagents.

Spark's molecular comparisons using ligand similarity are based on their molecular fields, not on their structure. The interaction between a ligand and a protein involves electrostatic fields and surface properties (e.g. hydrogen bonding, hydrophobic surfaces and so on). Two molecules which both bind to a common active site tend to make similar interactions with the protein and hence have highly similar field properties. Accordingly, using these properties to describe molecules is a powerful tool for the medicinal chemist as it concentrates on the aspects of the molecules that are important for biological activity. Using the fields gives a 'protein's view' of how the molecules would line up in the active site, generating ideas on how molecules with different structures could interact with the same protein. Docking may also be used to score the final result molecules, which can be particularly helpful for guiding ligand growth into unoccupied pockets of the target protein and to find novel results making interactions with the active site of the protein not mapped by an existing starter or reference molecule.

The major advance in Spark compared to previous bioisostere replacement tools is that Spark scores each potential replacement in context. Each candidate fragment is merged into the starting molecule and energy minimized before scoring. With ligand similarity scoring, the full field pattern for that molecule is calculated, and this is then compared to the starting structure. Alternatively, a docking score is calculated.

The filter options allow you to specify constraints on the type and properties of the fragments to try. Each option has three settings. The 'Yes' option specifies that the specified functionality must be present, the 'No' option specifies that it must not be present and the 'Optional' option specifies that it may or may not be present. For example, setting 'Contains an aromatic ring' to 'Yes' means that all suggested replacement fragments must contain an aromatic ring. Setting 'Contains a non-ring atom or bond' to 'No' will specify that only ring fragments with no exocyclic components may be used. The non-obvious flags are explained below under 'Filters'.

Constraints can be set to bias the Spark search and penalize results which do not satisfy the constraint. Three types of constraints are available:

Field constraints: specify that a particular type of field must be present in the result molecule. This could be a hydrophobic point which forces the Spark result molecule to fill a particular pocket, or an electrostatic point to enforce an interaction.
Pharmacophore constraints: force result molecules to have the chosen feature (for example, H-bond acceptor) at a specific position.
A receptor (protein) molecule can be used as an excluded volume when using ligand similarity scoring. The protein is not used in a pharmacophoric sense: however, result molecules that clash with the protein structure will be penalised.
Docking constraints: discard docked molecules that do not match the interaction with the specified protein atom (H-bond donor/acceptor or metal).

The advanced options allow you to further refine the Spark search.

This node wraps the executable 'sparkcl', which must be installed with a valid license for this node to work. If this is installed in the default location on Windows, then it should be found automatically. Otherwise, you must either set the 'Cresset Home' preference or the CRESSET_HOME environment variable to the base Cresset software install directory. You may also set the 'sparkcl Path' preference or the CRESSET_SPARKCL_EXE environment variable to point directly at the executable itself.

The Spark Database Search node can be configured to use additional resources to perform calculations. The time taken for the node to run will be drastically reduced if you use the Cresset Engine Broker™. To use this facility either set the "Cresset Engine Broker" preference or the CRESSET_BROKER environment variable to point to the location of your local Engine Broker. If you do not currently have the Cresset Engine Broker then contact Cresset (enquiries@cresset-group.com) for pricing on local and cloud based brokers.

For more information visit www.cresset-group.com or contact us at support@cresset-group.com.

Options

Basic

Column containing molecule structure: A column containing the starter molecule and an optional selection to replace. Only the first molecule will be used as a starter molecule: optionally up to 8 reference molecules may be included to guide the calculation.
Protein to use as an excluded volume or for docking: The first molecule in the specified column will either be used as an excluded volume when scoring fragments using ligand similarity or it will be used to dock the ligands. If you choose the docking calculation method, the protein should have been prepared previously to add missing hydrogen atoms, optimize the internal hydrogen bond network, remove atomic clashes and assign optimal protonation states. Only one protein may be provided.
Speed: Speed of operation of Spark. Choose from (in order of decreasing speed but increasing thoroughness): Normal or Exhaustive. Note that changing this option will alter the values of several other options.
Write calculation log to molecules: Write the calculation log as one of the SDF tags for each result.
Weighting: Set which column in the input datatable contains the relative weights of the reference molecules. The weight is used to control the scoring of each reference molecule, placing more or less emphasis on any individual molecule. Note that a weight of zero is permitted for any molecule including the starter molecule. However, this sometimes gives unusual effects such as large movements of the new molecule relative to the starter molecule. These effects can be mollified by setting a weight for the starter molecule of 10 or 20%. We recommend that you do not use a weight of less than 10% for the starter molecule.
Database(s) to search: Select the databases to search. More than one database can be selected by using the "Ctrl" key. Databases are searched for in the locations specified by the "SPARK_CRESSET_DB" and "SPARK_DB" environment variables, and also in the "database" directory in the Spark install location. An additional database can also be searched by specifying its full path using the "Selected Files" file picker by clicking "Browse..." and picking a Spark database file (fsd).

Starter molecule

Fragment selection input method

Spark requires you to specify a portion of the starter molecule to replace. This can be done in three alternative ways:

Spark Fragment Selector - The portion to replace is specified using the Spark Fragment Selector node. The Spark Fragment Selector node "out" port must be linked to the Spark Database Search "in" port.
Specify bonds to break - One or more bonds in the starter molecule will be broken. Bonds are specified as pairs of atoms identified by their index (starting at 1), with the first atom in the pair being retained, and the second atom being part of the removed section. The indices of the bonds to break must be typed in the text area. Each line should list only one of the bonds to break in the format atom1,atom2[,flags]. For example, to break the bond between atoms 2 and 7 (removing the portion of the molecule connected to atom 7), you must type '2,7'. Replacement of a central portion of a molecule can be accomplished by specifying all the bonds connecting that central portion to the rest of the molecule. For example, given C-C-C-C-O-C, numbered 1-6 left to right, the replacement of the terminal methoxy group could be requested with '4,5' (i.e. keep atom 4, and discard atom 5 and everything connected to it). Replacement of the two central carbons could be requested with '2,3 5,4' (i.e. keep atoms 2 and 5 and delete atoms 3 and 4 and anything in between).
Specify the atoms to replace - One or more atoms will be replaced in the starter molecule. The list of atoms should form a consistently connected fragment, should consist of heavy atoms only (no hydrogens), and the bonds connecting this fragment to the rest of the molecule must be single. The atom id's for the atoms to be replaced must be typed in the text area as a comma separated list in the format atom1,atom2,atom3,.... For example, given C-C-C-C-O-C, numbered 1-6 left to right, replacement of the terminal methoxy group could be requested with '5,6' (i.e. discard atoms 5 and 6). Replacement of the two central carbons could be requested with '3,4' (i.e. delete atoms 3 and 4).

Bond/Atom list

If 'Fragment selection input method' is set to 'Specify bonds to break'

The bonds in the starter molecule that are to be broken. Each line should only list one bond to break in the format:

atom1,atom2[,flags].

The available attachment point flags are: Br, C, Car, Cl, Csp, Csp2, Csp3, F, Hal, I, N, Nsp2, Nsp3, O, Osp3, P, PS, S.

For example the following will break 2 bonds between atoms 10, 9 and atoms 24, 14. The region of the molecule containing atoms 9 and 14 will be replaced. Atom 9 will only be replaced with a Nsp3 or Nsp2 atom while atom 14 can be replaced with any type of atom.

10,9,Nsp3,Nsp2

24,14

If 'Fragment selection input method' is set to 'Specify the atoms to replace'

The atom ids for the atoms to be replaced. The format should be a comma separated list of atom ids in the format atom1,atom2,atom3,...

Filters

Contains an aromatic ring: The definition of aromaticity is that a ring must obey the Hückel 4N+2 rule and may not contain an exocyclic double bond. Pyridones are thus non-aromatic.
Contains a non-ring atom or a non-ring bond: This option refers to any atom or bond not in a ring in the fragment. Whether or not the atom will be in a ring once joined into the final result molecule is immaterial. Selecting 'No' on this option is a good way to limit the search to pure ring systems.
Contains a H-bond donor: The definition of H-bond donor is quite restricted: a hydrogen atom attached to N or O.
Contains a H-bond acceptor: A fragment has a H-bond acceptor if it contains any of the following: =N-, -OH, =O, -C#N.
Contains toxophores etc: The toxophores list is fairly conservative and only includes reactive functional groups such as acid chlorides, sulphur halides, Cl-, Br-, or I-containing alkyl halides , azides, and peroxides. Nitro groups are not considered toxophores currently. However, phosphorus (in any form) is included, largely because it is not completely parameterised in Cresset XED force field. The standard databases supplied by Cresset are already filtered to remove all fragments with this flag.

Advanced

Maximum number of results to keep

The maximum number of results to keep. The default is 500 for Normal, and 1000 with Exhaustive..

Score Method

Ligand Similarity: score result molecules by field and shape similarity (the higher, the better) to the starter and reference ligands.
Docking: score result molecules by docking score (the more negative, the better).

Fraction of score from shape similarity

Set the weight of the shape component of the scoring function. The default of 0.5 means 50% field and 50% shape.

Gradient Cutoff

This cutoff is used when minimizing the new fragment into the retained portion of the starter molecule. A smaller value usually gives a more accurate conformation but takes longer to converge and exit. Values above 0.3 are recommended except when using significant computing resources.

Protein hardness

Soft - A small penalty is applied for each atom of the ligand that overlaps with a protein atom and each protein atom is treated as relatively "squashy". This option works well where you are prepared to accept results that may have some overlap, but you want to remove gross clashes with the protein.
Medium - A medium penalty is applied.
Hard - A large penalty is applied, and each protein atom is treated as relatively firm. Use this option where you want to remove all results that impinge on the protein structure.

Only has an effect if a protein is specified.

Scoring metric

Dice: default similarity metric in the current and previous versions of Spark.
Tanimoto: monotonic with Dice, so will not change the rank ordering of results, although the similarity values will change.
Tversky: use this metric to set up a more 'substructure-like' or 'superstructure-like' alignment. For a substructure-like alignment (i.e aligning molecules which are substructures of the query), use Tversky with Alpha Value=0.05. For a superstructure-like alignment (i.e. aligning molecules which are larger than but include the query), use Tversky with Alpha Value=0.95.

Alpha Value

Insert a value between 0.0 and 1.0. Only available if Tversky scoring metric is selected

Docking Region Buffer Size

The number of Angstroms by which to increase the size of the bounding box enclosing the reference molecules in all directions. This bounding box is used to construct the docking grid from the protein.

Field constraints

Consists of a set of numbers in the form index,size,reference e.g. 16,2.5 means that the field point with index 16 on the starter molecule should have a constraint of 2.5 applied to it; 87,7,2 means that the field point with index 87 on the first reference molecule should have a constraint of 7 applied to it. You may have more than one field constraint specified, separated by newlines. Please refer to the Spark manual for a detailed explanation of field constraints. For this option to work correctly, the input starter molecule must contain a "_cresset_fieldpoint" tag with the field point data in it. Note that the field points are appended to the atom lists, so if the molecule has 80 atoms, the first field point will have index 81.

Pharmacophore constraints

Consists of a set of numbers in the form index,type,strength,reference e.g. 16,d,3.2,2 means that the pharmacophore constraint on the atom with index 16 on the first reference molecule should be a donor with strength of 3.2 applied to it. You may have more than one field pharmacophore specified, separated by newlines. The characters for each type are 'd'=Donor, 'a'=Acceptor, '+'=Cation, '-'=Anion, 'm'=Metal binder and 'v'=Covalent. Please refer to the Spark manual for a detailed explanation of field constraints.

Docking constraints

chain,resname,resnum,atomname,type[,strength]

Specifies a docking constraint on an atom with the specified atom name from the residue with the specified chain, name, number and insertion code in the specified protein molecule. The type can be one of 'h'/'hbond' (hydrogen bond), 'm'/'metal' (metal), 'p'/'pistack' (Pi stacking), '+'/'pication' (Pi-cation), or 's'/'saltbridge' (Salt-bridge). The strength of the constraint must be one of 'w' (weak), 'n' (normal), or 's' (strong), and defaults to 'n' if not specified. You may provide more than one docking constraint, and these should be separated by newlines.

Example: A,GLN,85A,O,hbond,w

Maximum Docking Constraint Penalty

Set the maximum docking constraint penalty that will be tolerated. With the default value of 1.0, docking poses that do not match the provided docking constraints will be discarded. To let Lead Finder produce poses that violate the constraints, set this to a high value (e.g. 100).

Automatic constraint fragment size

If checked, then the maximum size of the replacement fragment is determined by the size of the selection region selected for replacement, plus 5 heavy atoms and/or 75 Daltons.

Maximum fragment molecular weight

Fragments with a molecular weight higher than this setting will be excluded.

Maximum fragment heavy atom count

Fragments with more heavy atoms than this setting will be excluded.

Maximum number of rotatable bonds

Only searches fragments with a number of rotatable bonds lower than this setting.

Input Ports

: Data table containing 1 to 9 molecules. The first molecule will be used as the "Starter Molecule" and is required. Optionally up to 8 reference molecules may be included to guide the calculation. See the Weighting option to configure how the references affect the process. The region of the "Starter Molecule" to replace can be specified by in the "Starter Molecule" tab or by linking the output of the "Spark Fragment Selector" node to this node.
: Optional protein molecule to use as an excluded volume or for docking.

Output Ports

: List of new molecules containing replacement fragments and their scores.

Popular Predecessors

Popular Successors

Views

This node has no views

Workflows

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension Cresset KNIME Nodes from the below update site following our NodePit Product and Node Installation Guide:

v5.4

A zipped version of the software site can be downloaded here.

Plugin provider: www.cresset-group.com

Plugin version: 2.8.0.230502

On NodePit since: 2024-12-06

Last update: 2025-04-01

KNIME versions: Since v3.6

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!