Similarity Matrix (from Molecules)

Generates a pairwise similarity or distance matrix using binary or scaled fingerprints from two sets of molecules.

If the first set is composed of only one probe molecule the result of the calculation is a table containing the similarities of this molecule and the library in the second set.

Backend implementation

$SCHRODINGER/utilities/canvasFPGen
canvasFPGen is used to generate the fingerprints for the input molecules.
$SCHRODINGER/utilities/canvasFPMatrix
canvasFPMatrix generates the pairwise similarity or distance matrix from the fingerprints from canvasFPGen.

Options

Include Molecule
Whether the molecule should be included in the output
Include Input
Whether all columns in the input should be included in the output
Fingerprint Type
Valid methods:
  • linear
  • maccs
  • radial
  • molprint2D
  • torsion
  • pairwise
  • triplet
  • dendritic
Note: Atom type is not used to generate finger prints when the "maccs" method is selected.
Precision
Select fingerprint precision:
  • 1024 (10-bit)
  • 2048 (11-bit)
  • 4096 (12-bit)
  • 32-bit (default)
  • 64-bit
Selecting 64-bit reduces collisions of "on" bits, but doubles the space required to store each key. Selecting 1024, 2048 and 4096 increases the chance of feature collisions.
Atom type
Atom typing scheme. Must be an integer value between 1 and 10 or C or E.
Omit bits that are set by less than percentage
Omit bits that are set by more than percentage
Omit bits only set in single molecule
Omit bits set in all molecules
Reduce precision of fingerprints by specified number of bits
Only used with 32-bit precision. Reduces precision of fingerprints by specified number of bits. It increases the chance of feature collisions. For example, a value of 22 will reduce each single precision key (32 bits) into a range of 1024 (10 bits).
Scaling
Rescale binary fingerprint data to real.
Metric types
Valid metrics:
  • buser
  • cosine
  • dice
  • dixon
  • euclidean
  • hamann
  • hamming
  • kulczynski
  • matching
  • McConnaughey
  • minmax
  • patterDifference
  • pearson
  • petke
  • rogersTanimoto
  • shape
  • simpson
  • size
  • soergel
  • tanimoto
  • tversky
  • variance
  • yule
Maximum number of fingerprints to load in memory at a time
Ignore any scaled fp values
Tversky alpha parameter
Tversky beta parameter
Gaussian parameter to make output matrix sparse
Parameter flow variables
Any valid option for proplister.py can be specified through flow variables. Only String variables are accepted.
Usage:
Flow variable prefix keyword: CanvasFPMatrix
Note: To specify an option as flow variable, the flow variable name should be like:
keyword-option_name for single-dash option
keyword--option_name for double-dash option

To add a new option with value, specify the option_name and the corresponding value through flow variable.

To add a new option without value, specify the option_name and the value as _on_ through flow variable.

To override an existing option's value in the command line, just specify the option_name and the new value through the flow variable.

To remove an existing option without a value, specify the option_name and the value as _off_ through the flow variable.

To remove an existing option with a value, specify the option_name and the value as _rm_ through the flow variable.

Input Ports

Icon
First set of molecules in Maestro, SMILES or SD format
Icon
Second set of molecules in Maestro, SMILES or SD format

Output Ports

Icon
Similarities Matrix in Table format

Views

Std output/error of FPGen 1
Std output/error of FPGen 1
Std output/error of FPGen 2
Std output/error of FPGen 2
Std output/error of FPMatrix
Std output/error of FPMatrix

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.