Icon

04_​SimilaritySearch

In this exercise we expand our results by finding building blocks similar to those we selected in the previous exercise (03_Clustering).

The catalog of building blocks is taken from : https://zinc15.docking.org/catalogs/enaminebbe/substances/

We will first calculate the fingerprints from the compounds and the building blocks. To ensure that the settings for this calculation are the same we will create and use a shared component. Then we perform a similarity search and end with a final selection and annotation step.






I. Fingerprints Using Shared Components1. Read the clusters you picked in the last exercise by dragging and dropping the file into your workspace2. Create a shared component to generate fingerprints with the same settings:- Use the node Column Selection Configuration to allow selection of the desired SMILES column, makesure that only SMILES columns can be selected by applying the type filter.- Use the RDKit Fingerprint node to create Morgan Fingerprints called mfp2 from the canonical smilescolumn- Create a component called Generate Fingerprints and share it in the data folder of this exercise in yourlocal workspace (Right Click on the component -> Component -> Share)3. Execute the Fille Reader with the building blocks and investigate the data. Add a RDKit FunctionalGroup Filter node afterwards. Filter the groups Carboxylic Acid and Aromatic Carboxylic Acid from thesmiles column.4. Reuse the shared component to generate fingerprints for the filtered data II. Similarity Search1. Perform a Similarity search for the two sets of generated fingerprints from both sharedcomponents. Use the tanimoto distance, select similarity as output, and extract the 5 most similarneighbors. Use a similarity range between 0.35 and 1 and use the prefix "nearest neighbor" for theoutput, the id of the representative column is zinc_id2. Join the SMILES String from the filtered Zinc data to the result3. Remove the coumns mfp2 and nearest neighbor-index4. Create a Canonical SMILES for the resulting building blocks, give it an appropriate name such asbb_canonical_smiles5. Adjust the Reference Row filter to remove building blocks that were already in the input data andconnect the output to the RDKit From Molecule node. Create a RDKit molecule column with a newname from the bb_canonical_smiles column and use this column in the Renderer to Image node.Choose RDKit 2D depiction as Renderer and svg as image type. Investigate the output III. Final Selection and Annotation1. Create a component that allows the user to make selections and to add finalannotations:- add a column called Annotation using the Constant Value Column- resort the result so that the building block and the Annotation column come first- use the Table Editor to display the result, make sure that one can edit theAnnotation column, but nothing else (Editor tab). Make sure to enable selection- use the row splitter node to split the selected data from the rest- collapse everything into a component, name it accordingly. 2. Use the view to make selections and annotations, make sure the desiredresult is in the output 04_Similarity SearchIn this exercise we find similar building blocks for the interesting compounds found in the previous exercise Node 67Node 68Node 69Node 72Node 74Node 75Node 76Node 77Node 78 generatefingerprints Column Filter RDKit Canon SMILES ReferenceRow Filter Renderer to Image RDKit From Molecule File Reader(Complex Format) Table Reader Table Reader Smiles Reader generatefingerprints RDKit FunctionalGroup Filter Similarity Search Joiner Column Rename RDKit Canon SMILES select and annotate I. Fingerprints Using Shared Components1. Read the clusters you picked in the last exercise by dragging and dropping the file into your workspace2. Create a shared component to generate fingerprints with the same settings:- Use the node Column Selection Configuration to allow selection of the desired SMILES column, makesure that only SMILES columns can be selected by applying the type filter.- Use the RDKit Fingerprint node to create Morgan Fingerprints called mfp2 from the canonical smilescolumn- Create a component called Generate Fingerprints and share it in the data folder of this exercise in yourlocal workspace (Right Click on the component -> Component -> Share)3. Execute the Fille Reader with the building blocks and investigate the data. Add a RDKit FunctionalGroup Filter node afterwards. Filter the groups Carboxylic Acid and Aromatic Carboxylic Acid from thesmiles column.4. Reuse the shared component to generate fingerprints for the filtered data II. Similarity Search1. Perform a Similarity search for the two sets of generated fingerprints from both sharedcomponents. Use the tanimoto distance, select similarity as output, and extract the 5 most similarneighbors. Use a similarity range between 0.35 and 1 and use the prefix "nearest neighbor" for theoutput, the id of the representative column is zinc_id2. Join the SMILES String from the filtered Zinc data to the result3. Remove the coumns mfp2 and nearest neighbor-index4. Create a Canonical SMILES for the resulting building blocks, give it an appropriate name such asbb_canonical_smiles5. Adjust the Reference Row filter to remove building blocks that were already in the input data andconnect the output to the RDKit From Molecule node. Create a RDKit molecule column with a newname from the bb_canonical_smiles column and use this column in the Renderer to Image node.Choose RDKit 2D depiction as Renderer and svg as image type. Investigate the output III. Final Selection and Annotation1. Create a component that allows the user to make selections and to add finalannotations:- add a column called Annotation using the Constant Value Column- resort the result so that the building block and the Annotation column come first- use the Table Editor to display the result, make sure that one can edit theAnnotation column, but nothing else (Editor tab). Make sure to enable selection- use the row splitter node to split the selected data from the rest- collapse everything into a component, name it accordingly. 2. Use the view to make selections and annotations, make sure the desiredresult is in the output 04_Similarity SearchIn this exercise we find similar building blocks for the interesting compounds found in the previous exercise Node 67Node 68Node 69Node 72Node 74Node 75Node 76Node 77Node 78 generatefingerprints Column Filter RDKit Canon SMILES ReferenceRow Filter Renderer to Image RDKit From Molecule File Reader(Complex Format) Table Reader Table Reader Smiles Reader generatefingerprints RDKit FunctionalGroup Filter Similarity Search Joiner Column Rename RDKit Canon SMILES select and annotate

Nodes

Extensions

Links