Generate Spark Database

Generate Spark™ Database is a tool for generating or updating Spark databases. It reads a list of molecules, breaks them into fragments, and stores the fragments into a database file for use within Spark or with the 'Spark Database Search' node.

This node wraps the executable 'sparkdb', which must be installed with a valid license for this node to work. If this is installed in the default location on Windows, then it should be found automatically. Otherwise, you must either set the 'Cresset Home' preference or the CRESSET_HOME environment variable to the base Cresset software install directory. You may also set the 'sparkdb Path' preference or the CRESSET_SPARKDB_EXE environment variable to point directly at the executable itself.

The Generate Spark Database node can be configured to use additional resources to perform calculations. The time taken for the node to run will be drastically reduced if you use the Cresset Engine Broker™. To use this facility either set the 'Cresset Engine Broker' preference or the CRESSET_BROKER environment variable to point to the location of your local Engine Broker. If you do not currently have the Cresset Engine Broker then contact Cresset (enquiries@cresset-group.com) for pricing on local and cloud based brokers.

For more information visit www.cresset-group.com or contact us at support@cresset-group.com.

Options

Basic

Column containing input molecules structures: The column in the first input datatable containing the molecules to fragment and add to the database.
Title column: The column in the first input datatable containing the title of the molecule. If this is left blank, then the text in the molecule structure column will be used. The molecule title will appear in the 'Spark Database Search' node output in the 'Parent Title' column.
Extra meta data column 1: The column in the first input datatable containing extra meta data to store in the database. The meta data will appear in the "Spark Database Search" node output in the 'Parent Aux1' column.
Extra meta data column 2: The column in the first input datatable containing extra meta data to store in the database. The meta data will appear in the "Spark Database Search" node output in the 'Parent Aux2' column.
Database to create/update: The full path to the database to create or update. For this database to be accessible from the "Spark Database Search" node the database should be saved to one of the directories listed in the 'SPARK_CRESSET_DB' or 'SPARK_DB' environment variables.
Category: When selecting databases in Spark, this database will be listed under the category specified by this option. If this is not set, then the database will appear as 'Uncategorized'.
Sub-category: When selecting databases in Spark, this database will be listed under the subcategory specified by this option. If this is not set, then the database will appear directly under its category.
Description: A description of the database.
Speed: Speed of the operation. Choose from (in order of decreasing speed, but increasing thoroughness): Quick, Normal or Exhaustive. Note that changing this option will alter the values of several other options.

Fragmentation Method

Fragmentation mode

Specify how the input molecules are to be handled.

Molecules, need fragmentation - the input molecules will be broken into pieces using Cresset's fragmentation rules, and the pieces stored in the database.
Pre-labelled fragments - the input molecules are assumed to be pre-existing fragments which have been labelled with a particular element marking the attachment points (see the Attachment points labels option). The attachment point labels will be removed, and the molecules will be imported into the database without further fragmentation.
Reagent importer - the input molecules are reagents to be processed according one or more of Cresset's reagent-handling rules (see the Reagent type option). This mode converts a file of usable reagents into the R-group that is used in the final molecule.

Attachment point labels

This option sets the atomic number of the element which labels the fragments' attachment point (e.g., 52 for Tellurium). Any molecule without such a label is ignored.

Create all reagent databases

If checked, each of the available reagent-handling rules will be applied to the input molecules, creating in turn the appropriate databases. The databases will be named as specified by the 'Database to create/update' option, with the reagent name added to it. Any input molecule that does not contain a matching pattern will be ignored.

Reagent type

If importing reagents, you need to specify what the reactive group is and how it is changed during the reaction. For example, a set of boronic acids for use in a Suzuki coupling needs to have the boronic acid removed and the atom that it was attached to labelled as the fragment attachment point. The default reagents are:

Acids, delete the -COOH – Acids/acid chlorides where we keep only the group attached to the acid carbonyl. e.g. R-COOH -> R-*
Acids, keep the CO – Acids/acid chlorides where we attach through the carbonyl group (eg acylations)e.g. R-COOH -> R-C(=O)-*
Aliphatic alcohols, delete the O – Aliphatic alcohols used as alkylating agents where the O is deleted on addition e.g. R-OH -> R-*
Alcohols, keep the O - Alcohols and phenols where the attachment is through the oxygen e.g. R-OH -> R-O-*
Aliphatic halide - Primary/secondary/tertiary aliphatic halides (Cl,Br,I) e.g. R(1-3)C-Cl -> R(1-3)C-*
Alkynes, delete the -C#C - Alkynes, keep only the attached group e.g. R-C#C -> R-*
Aromatic alcohols, keep the O - Aromatic alcohols (phenols) where the attachment is through the oxygen e.g. Ar-OH -> Ar-O-*
Aromatic amines, keep the N – Primary and secondary aromatic amines (anilines) where the N is the attachment point such as in reductive aminations e.g. ar-NH-R -> ar-N(-R)-*
Aromatic halide - Aromatic halides (Cl,Br,I) e.g. Ph-Cl -> Ph-*
Aromatic boronic acids, delete -B(OH)2 - Aromatic boronic acids for Suzuki couplings etc: lose the boronic acid and attach the remainder. e.g. Ph-B(OH)2 -> Ph-*
Cyano groups, delete -CN - Cyano reagents, keeping only the attached group e.g. R-CN -> R-*
Isocyanates, keep -NCO - Isocyanates, keeping all atoms and forming an amide e.g. R-N=C=O -> R-NC(=O)-*
Olefins, delete the -C=C - Terminal olefins, keep only the attached group e.g. R-C=C -> R-*
Primary aliphatic amines, delete N - Primary aliphatic amines as an alkylating agent where the N is deleted on addition e.g. R-NH2 -> R-*
Primary aliphatic amines, keep N – Primary aliphatic amines where the N is the attachment point such as in reductive aminations e.g. R-NH2 -> R-NH-*
Primary aliphatic halide - Primary aliphatic halides (Cl,Br,I) e.g. R-CH2-Cl -> R-CH2-*
Primary aromatic amines, delete N – Primary aromatic amines (anilines) where the N is removed e.g. ar-NH2 -> ar-*
Aldehydes/ketones, delete the O and reduce C – Aldehydes/ketones where we attach through reductive amination e.g. R1-CO-R2 -> R1=CH(R2)-*
Secondary aliphatic amines, keep N – Secondary aliphatic amines where the N is the attachment point such as in nucleophilic substitution e.g. R1(R2)NH -> R1(R2)N-*
Sulfonic acids, delete the -SO2X - Sulfonic acids/acid chlorides where we keep only the group attached to the sulfur e.g. R-SO3H -> R-*
Sulfonic acids, keep the -SO2 - Sulfonic acids/acid chlorides where we keep the -SO2 group e.g. R-SO3H -> R-SO2-*
Aliphatic thiols, delete S - Thiols used as alkylating agents where the S is deleted on addition e.g. R-SH -> R-*
Thiols, keep S - Thiols where the attachment is through the sulfur e.g. R-SH -> R-S-*

Fragmentation Settings

Maximum attachment points per fragment: Specifies the maximum number of attachment points a fragment can have. The larger the value, the larger the database.
Only keep ring-containing fragments: If checked, only fragments containing one or more ring atoms will be kept.
Reprocess molecules that have already been seen: If this option is checked, molecules that have been fragmented previously will be processed again. This option is useful if you want to change the fragmentation settings (e.g. Maximum attachment points per fragment). Note that the frequency of occurrence data will no longer be reliable when this option is turned on (for example, if you run the same file twice, all fragment frequencies will double) and that using this option will not cause fragments that are already in the database to have their conformations recalculated. The default value for this option (unchecked) is to completely skip molecules that have already been seen.
Maximum fragment heavy atom count: Fragments with more than this number of heavy atoms will not be generated.
Maximum fragment molecular weight: Fragments which weigh more than this limit will not be generated.
Maximum number of rotatable bonds: Fragments which exceed this limit will not be generated. This is useful to prevent long alkyl chains and the like from appearing in the database.
One rotatable bond counts as this many heavy atoms: One rotatable bond counts as this many heavy atoms when checking the maximum fragment heavy atom count. This gives the option to penalize molecules with large numbers of rotatable bonds by including them on the 'Maximum fragment heavy atom count'.

Conformer Hunt

Filter duplicate conformers at RMS

Sets the similarity threshold below which two conformers are deemed identical. This effectively controls the coarseness of the sampling of conformational space. A low value leads to conformations that are only marginally different, while using a large value means that a conformation near the 'correct' one may not be generated. Values of 0.5 to 1.0 are recommended: values at the higher end of the range are more appropriate for larger, more flexible molecules. Note that this option applies equally to calculated conformations and to fragments that are imported with conformations already generated or in a pre-determined conformation.

Maximum number of conformations

The maximum number of conformations to generate for any fragment. Values of 20-30 are recommended: this should usually suffice to cover the conformational space of most reasonable fragments. If you are generating particularly large or flexible fragments you may want to increase this to 50 (at the expense of longer generation time, larger database files and longer search times).
If set to 0 then no conformations will be generated, and the fragments will be imported in the input conformations. This is useful for e.g. building databases from PDB or CSD conformations.

No. of high-T dynamics runs for flexible rings

Most small rings are handled using a ring conformation library. Conformations for rings that are not found in the library are sampled using high-temperature (~600K) dynamics with energy initially distributed into torsional degrees of freedom. The number of dynamics runs (and hence the degree of ring conformation sampling) is set by this value. Values of 2-10 are recommended. Values above 5 make little difference to flexible rings of fewer than 8 atoms.

Gradient cutoff for conformer minimization

All conformers found are minimized using the XED force field. This option sets the gradient cut-off at which the minimization is terminated. Values that are too small lead to insufficient sampling of conformational space and long run times. Values that are too large can lead to unrealistic structures being generated. Values of 0.1 kcal/mol/A to 1.0 kcal/mol/A are recommended with values at the smaller end of the range being preferred if the 'Include coulombics' option is not checked.

Energy window

Conformations that have a minimized energy that is outside the energy window are discarded. The window is calculated from the lowest energy conformation that has been found. The ideal value for this option depends on the 'Gradient cut-off for conformer minimization' and 'Include coulombics' options. The best results when the 'Include coulombics' option is not checked are obtained by minimizing to a low gradient (0.1 or better) and applying a smaller energy window (3 kcal/mol) but this significantly increases the time for the calculation. Checking the 'Include coulombics' option requires a significantly larger energy window for large molecules (12 kcal/mol) as these can form very low energy collapsed and internally H-bonded structures.

Acyclic secondary amides handling

Specify how the conformation hunter is to handle amides.

Force amides trans - forces all secondary amides to adopt the trans geometry.
Use input amide geometry - leaves secondary amides in the geometry that they were in the input file and sets them as non-rotatable. As a result, if the input molecule was drawn with a cis amide then only conformations with cis amides will be generated.
Allow amides to spin - allows the amide bind to spin, so a mixture of cis and trans amides can be generated.

Allow boats and twist-boats

By default any conformation containing a boat or twistboat conformation of a 6-membered ring will be filtered out (unless all conformations have boats). If this option is checked then this filtering will not take place and boat conformations will be allowed.

Include coulombics

If checked, then the conformer generation process uses the full force field, including long-range electrostatics. Better conformer populations are usually generated with this option not checked.

Ignore Existing Fragments

Ignore Fragments in Databases: Skips fragments that are already present in the specified databases: more than one database can be selected by using the "Ctrl" key.

Input Ports

: The molecules to be fragmented and added to the database.

Output Ports

This node has no output ports

Popular Predecessors

Popular Successors

Spark Database Search100 %

Views

This node has no views

Workflows

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension Cresset KNIME Nodes from the below update site following our NodePit Product and Node Installation Guide:

v5.4

A zipped version of the software site can be downloaded here.

Plugin provider: www.cresset-group.com

Plugin version: 3.0.0.250226

On NodePit since: 2024-12-06

Last update: 2025-06-02

KNIME versions: Since v3.6

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!