Icon

03_​SimilaritySearch

04_SimilaritySearch

In this exercise we expand our results by finding building blocks similar to those we selected in the previous exercise (03_Clustering).

The catalog of building blocks is taken from : https://zinc15.docking.org/catalogs/enaminebbe/substances/

We will first calculate the fingerprints from the compounds and the building blocks. To ensure that the settings for this calculation are the same we will create and use a shared component. Then we perform a similarity search and end with a final selection and annotation step.






I. Fingerprints Using Shared Components1. Read the clusters you picked in the last exercise by dragging and dropping the file into your workspace2. Execute the Fille Reader with the building blocks and investigate the data. Add a RDKit Functional Group Filter node afterwards. Filterthe groups Carboxylic Acid and Aromatic Carboxylic Acid from the smiles column.3. Create a shared component to generate fingerprints with the same settings:- Use the node Column Selection Configuration to allow selection of the desired SMILES column, make sure that only SMILES columns canbe selected by applying the type filter.- Use the RDKit Fingerprint node to create Morgan Fingerprints called mfp2 from the canonical smiles column- Create a component called Generate Fingerprints and share it in the data folder of this exercise in your local workspace (Right Click on thecomponent -> Component -> Share)4. Reuse the shared component to generate fingerprints for the filtered building blocks II. Similarity Search1. Perform a Similarity search for the two sets of generated fingerprints from both shared components. Use the tanimoto distance,select similarity as output, and extract the 5 most similar neighbors. Use a similarity range between 0.35 and 1 and use the prefix"nearest neighbor" for the output, the id of the representative column is zinc_id2. Join the SMILES String from the filtered Zinc data to the result3. Remove the coumns mfp2 and nearest neighbor-index4. Create a Canonical SMILES for the resulting building blocks, give it an appropriate name such as bb_canonical_smiles5. Adjust the Reference Row filter to remove building blocks that were already in the input data and connect the output to the RDKitFrom Molecule node. Create a RDKit molecule column with a new name from the bb_canonical_smiles column and use this columnin the RDKit Molecule to SVG node. Name the column bb_image and remove the source column III. Final Selection and Annotation1. Create a component that allows the user to make selections and to add final annotations:- add a column called Annotation using the Constant Value Column- resort the result so that the building block and the Annotation column come first- use the Table Editor to display the result, make sure that one can edit the Annotation column, but nothingelse (Editor tab). Make sure to enable selection- use the row splitter node to split the selected data from the rest- collapse everything into a component, name it accordingly. 2. Use the view to make selections and annotations, make sure the desired result is in the output 03_Similarity SearchIn this exercise we find similar building blocks for the interesting compounds found in the previous exercise Node 67 Column Filter RDKit Canon SMILES ReferenceRow Filter File Reader(Complex Format) RDKit Moleculeto SVG RDKit From Molecule I. Fingerprints Using Shared Components1. Read the clusters you picked in the last exercise by dragging and dropping the file into your workspace2. Execute the Fille Reader with the building blocks and investigate the data. Add a RDKit Functional Group Filter node afterwards. Filterthe groups Carboxylic Acid and Aromatic Carboxylic Acid from the smiles column.3. Create a shared component to generate fingerprints with the same settings:- Use the node Column Selection Configuration to allow selection of the desired SMILES column, make sure that only SMILES columns canbe selected by applying the type filter.- Use the RDKit Fingerprint node to create Morgan Fingerprints called mfp2 from the canonical smiles column- Create a component called Generate Fingerprints and share it in the data folder of this exercise in your local workspace (Right Click on thecomponent -> Component -> Share)4. Reuse the shared component to generate fingerprints for the filtered building blocks II. Similarity Search1. Perform a Similarity search for the two sets of generated fingerprints from both shared components. Use the tanimoto distance,select similarity as output, and extract the 5 most similar neighbors. Use a similarity range between 0.35 and 1 and use the prefix"nearest neighbor" for the output, the id of the representative column is zinc_id2. Join the SMILES String from the filtered Zinc data to the result3. Remove the coumns mfp2 and nearest neighbor-index4. Create a Canonical SMILES for the resulting building blocks, give it an appropriate name such as bb_canonical_smiles5. Adjust the Reference Row filter to remove building blocks that were already in the input data and connect the output to the RDKitFrom Molecule node. Create a RDKit molecule column with a new name from the bb_canonical_smiles column and use this columnin the RDKit Molecule to SVG node. Name the column bb_image and remove the source column III. Final Selection and Annotation1. Create a component that allows the user to make selections and to add final annotations:- add a column called Annotation using the Constant Value Column- resort the result so that the building block and the Annotation column come first- use the Table Editor to display the result, make sure that one can edit the Annotation column, but nothingelse (Editor tab). Make sure to enable selection- use the row splitter node to split the selected data from the rest- collapse everything into a component, name it accordingly. 2. Use the view to make selections and annotations, make sure the desired result is in the output 03_Similarity SearchIn this exercise we find similar building blocks for the interesting compounds found in the previous exercise Node 67 Column Filter RDKit Canon SMILES ReferenceRow Filter File Reader(Complex Format) RDKit Moleculeto SVG RDKit From Molecule

Nodes

Extensions

Links