0 ×

Spark Database Search

Cresset KNIME Nodes version by www.cresset-group.com

Spark™ is a bioisostere replacement tool which by selecting a moiety to be replaced in a given 'starter molecule' generates a list of new molecules containing replacement fragments with similar electrostatic and steric properties. Spark comes with a set of databases of fragments generated from whole molecules (e.g. commercially available or literature reported compounds) or from synthetic reagents.

Spark's molecular comparisons are based on their molecular fields, not on their structure. The interaction between a ligand and a protein involves electrostatic fields and surface properties (e.g. hydrogen bonding, hydrophobic surfaces and so on). Two molecules which both bind to a common active site tend to make similar interactions with the protein and hence have highly similar field properties. Accordingly, using these properties to describe molecules is a powerful tool for the medicinal chemist as it concentrates on the aspects of the molecules that are important for biological activity. Using the fields gives a 'protein's view' of how the molecules would line up in the active site, generating ideas on how molecules with different structures could interact with the same protein.

The major advance in Spark compared to previous bioisostere replacement tools is that Spark scores each potential replacement in context. Each candidate fragment is merged into the starting molecule, the full field pattern for that molecule is calculated, and this is then compared to the starting structure.

The filter options allow you to specify constraints on the type and properties of the fragments to try. Each option has three settings. The 'Yes' option specifies that the specified functionality must be present, the 'No' option specifies that it must not be present and the 'Optional' option specifies that it may or may not be present. For example, setting 'Contains an aromatic ring' to 'Yes' means that all suggested replacement fragments must contain an aromatic ring. Setting 'Contains a non-ring atom or bond' to 'No' will specify that only ring fragments with no exocyclic components may be used. The non-obvious flags are explained below under 'Filters'.

Constraints can be set to bias the Spark search and penalize results which do not satisfy the constraint. Three types of constraints are available:

  1. Field constraints: specify that a particular type of field must be present in the result molecule. This could be a hydrophobic point which forces the Spark result molecule to fill a particular pocket, or an electrostatic point to enforce an interaction.
  2. Pharmacophore constraints: force result molecules to have the chosen feature (for example, H-bond acceptor) at a specific position.
  3. A receptor (protein) molecule can be used as an excluded volume. The protein is not used in a pharmacophoric sense: however, result molecules that clash with the protein structure will be penalised.

The advanced options allow you to further refine the Spark search.

This node wraps the executable 'sparkcl', which must be installed with a valid license for this node to work. If this is installed in the default location on Windows, then it should be found automatically. Otherwise, you must either set the 'Cresset Home' preference or the CRESSET_HOME environment variable to the base Cresset software install directory. You may also set the 'sparkcl Path' preference or the CRESSET_SPARKCL_EXE environment variable to point directly at the executable itself.

The Spark Database Search node can be configured to use additional resources to perform calculations. The time taken for the node to run will be drastically reduced if you use the Cresset's Engine Broker. To use this facility either set the "Cresset's Engine Broker" preference or the CRESSET_BROKER environment variable to point to the location of your local Engine Broker. If you do not currently have the Cresset Engine Broker then contact Cresset (enquiries@cresset-group.com) for pricing on local and cloud based brokers.

For more information visit www.cresset-group.com or contact us at support@cresset-group.com.



Column containing molecule structure
A column containing the starter molecule and an optional selection to replace. Only the first molecule will be used as a starter molecule: optionally up to 8 reference molecules may be included to guide the calculation.
Protein to use as an excluded volume
The first molecule in the specified column will be used as an excluded volume when scoring fragments.
Speed of operation of Spark. Choose from (in order of decreasing speed but increasing thoroughness): Normal or Exhaustive. Note that changing this option will alter the values of several other options.
Write calculation log to molecules
Write the calculation log as one of the SDF tags for each result.
Set which column in the input datatable contains the relative weights of the reference molecules. The weight is used to control the scoring of each reference molecule, placing more or less emphasis on any individual molecule. Note that a weight of zero is permitted for any molecule including the starter molecule. However, this sometimes gives unusual effects such as large movements of the new molecule relative to the starter molecule. These effects can be mollified by setting a weight for the starter molecule of 10 or 20%.
Database(s) to search
Select the databases to search. More than one database can be selected by using the "Ctrl" key. Databases are searched for in the locations specified by the "SPARK_CRESSET_DB" and "SPARK_DB" environment variables, and also in the "database" directory in the Spark install location. A database can also be searched by specifying its full path.

Starter molecule

Fragment selection input method
Spark requires you to specify a portion of the starter molecule to replace. This can be done in three alternative ways:
  • Spark Fragment Selector - The portion to replace is specified using the Spark Fragment Selector node. The Spark Fragment Selector node "out" port must be linked to the Spark Database Search "in" port.
  • Specify bonds to break - One or more bonds in the starter molecule will be broken. Bonds are specified as pairs of atoms identified by their index (starting at 1), with the first atom in the pair being retained, and the second atom being part of the removed section. The indices of the bonds to break must be typed in the text area. Each line should list only one of the bonds to break in the format atom1,atom2[,flags]. For example, to break the bond between atoms 2 and 7 (removing the portion of the molecule connected to atom 7), you must type "2,7". Replacement of a central portion of a molecule can be accomplished by specifying all the bonds connecting that central portion to the rest of the molecule. For example, given C-C-C-C-O-C, numbered 1-6 left to right, the replacement of the terminal methoxy group could be requested with '4,5' (i.e. keep atom 4, and discard atom 5 and everything connected to it). Replacement of the two central carbons could be requested with '2,3 5,4' (i.e. keep atoms 2 and 5 and delete atoms 3 and 4 and anything in between).
  • Specify the atoms to replace - One or more atoms will be replaced in the starter molecule. The list of atoms should form a consistently connected fragment, should consist of heavy atoms only (no hydrogens), and the bonds connecting this fragment to the rest of the molecule must be single. The atom id's for the atoms to be replaced must be typed in the text area as a comma separated list in the format atom1,atom2,atom3,.... For example, given C-C-C-C-O-C, numbered 1-6 left to right, replacement of the terminal methoxy group could be requested with '5,6' (i.e. discard atoms 5 and 6). Replacement of the two central carbons could be requested with '3,4' (i.e. delete atoms 3 and 4).
Bond/Atom list

If 'Fragment selection input method' is set to 'Specify bonds to break'

The bonds in the starter molecule that are to be broken. Each line should only list one bond to break in the format:


The available attachment point flags are: Br, C, Car, Cl, Csp, Csp2, Csp3, F, Hal, I, N, Nsp2, Nsp3, O, Osp3, P, PS, S.

For example the following will break 2 bonds between atoms 10, 9 and atoms 24, 14. The region of the molecule containing atoms 9 and 14 will be replaced. Atom 9 will only be replaced with a Nsp3 or Nsp2 atom while atom 14 can be replaced with any type of atom.



If 'Fragment selection input method' is set to 'Specify the atoms to replace'

The atom ids for the atoms to be replaced. The format should be a comma separated list of atom ids in the format atom1,atom2,atom3,...


Contains an aromatic ring
The definition of aromaticity is that a ring must obey the Hückel 4N+2 rule and may not contain an exocyclic double bond. Pyridones are thus non-aromatic.
Contains a non-ring atom or a non-ring bond
This option refers to any atom or bond not in a ring in the fragment. Whether or not the atom will be in a ring once joined into the final result molecule is immaterial. Selecting 'No' on this option is a good way to limit the search to pure ring systems.
Contains a H-bond donor
The definition of H-bond donor is quite restricted: a hydrogen atom attached to N or O.
Contains a H-bond acceptor
A fragment has a H-bond acceptor if it contains any of the following: =N-, -OH, =O, -C#N.
Contains toxophores etc
The toxophores list is fairly conservative and only includes reactive functional groups such as acid chlorides, sulphur halides, Cl-, Br-, or I-containing alkyl halides , azides, and peroxides. Nitro groups are not considered toxophores currently. However, phosphorus (in any form) is included, largely because it is not completely parameterised in Cresset's XED force field. The standard databases supplied by Cresset are already filtered to remove all fragments with this flag.


Maximum number of results to keep
The maximum number of results to keep. The default is 500 for Normal, and 1000 with Exhaustive..
Fraction of score from shape similarity
Set the weight of the shape component of the scoring function. The default of 0.5 means 50% field and 50% shape.
Gradient Cutoff
This cutoff is used when minimizing the new fragment into the retained portion of the starter molecule. A smaller value usually gives a more accurate conformation but takes longer to converge and exit. Values above 0.3 are recommended except when using significant computing resources.
Scoring metric
  • Dice: default similarity metric in the current and previous versions of Spark.
  • Tanimoto: monotonic with Dice, so will not change the rank ordering of results, although the similarity values will change.
  • Tversky: use this metric to set up a more 'substructure-like' or 'superstructure-like' alignment. For a substructure-like alignment (i.e aligning molecules which are substructures of the query), use Tversky with Alpha Value=0.05. For a superstructure-like alignment (i.e. aligning molecules which are larger than but include the query), use Tversky with Alpha Value=0.95.
Alpha Value
Insert a value between 0.0 and 1.0. Only available if Tversky scoring metric is selected
Field constraints
Consists of a set of numbers in the form index,size,reference e.g. 16,2.5 means that the field point with index 16 on the starter molecule should have a constraint of 2.5 applied to it You may have more than one field constraint specified, separated by newlines. Please refer to the Spark manual for a detailed explanation of field constraints. For this option to work correctly, the input starter molecule must contain a "_cresset_fieldpoint" tag with the field point data in it. Note that the field points are appended to the atom lists, so if the molecule has 80 atoms, the first field point will have index 81.
Pharmacophore constraints
Consists of a set of numbers in the form index,type,strength e.g. 16,d,3.2 means that the pharmacophore constraint on the atom with index 16 should be a donor with strength of 3.5 applied to it. You may have more than one field pharmacophore specified, separated by newlines. The characters for each type are 'd'=Donor, 'a'=Acceptor, '+'=Cation, '-'=Anion, 'm'=Metal binder and 'v'=Covalent. Please refer to the Spark manual for a detailed explanation of field constraints.
Automatic constraint fragment size
If checked, then the maximum size of the replacement fragment is determined by the size of the selection region selected for replacement, plus 5 heavy atoms and/or 75 Daltons.
Maximum fragment molecular weight
Fragments with a molecular weight higher than this setting will be excluded.
Maximum fragment heavy atom count
Fragments larger than this setting will be excluded.
Maximum number of rotatable bonds
Only searches fragments with a number of rotatable bonds lower than this setting.

Input Ports

Data table containing 1 to 9 molecules. The first molecule will be used as the "Starter Molecule" and is required. Optionally up to 8 reference molecules may be included to guide the calculation. See the Weighting option to configure how the references affect the process. The region of the "Starter Molecule" to replace can be specified by in the "Starter Molecule" tab or by linking the output of the "Spark Fragment Selector" node to this node.
Optional protein molecule to use as an excluded volume.

Output Ports

List of new molecules containing replacement fragments and their scores.

Best Friends (Incoming)

Best Friends (Outgoing)



To use this node in KNIME, install Cresset KNIME Nodes from the following update site:

Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform.


You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.