Icon

Similarity_​Search_​Exercise

04_SimilaritySearch

In this exercise we expand our results by finding building blocks similar to those we selected in the previous exercise (03_Clustering).

The catalog of building blocks is taken from : https://zinc15.docking.org/catalogs/enaminebbe/substances/

We will first calculate the fingerprints from the compounds and the building blocks. To ensure that the settings for this calculation are the same we will create and use a shared component. Then we perform a similarity search and end with a final selection and annotation step.






I. Fingerprints Using Shared Components1. Read the picked_cluster_gcc.table from the data folder by dragging and dropping the file into your workspace2. Create a shared component to generate fingerprints with the same settings:- Use the node Column Selection Configuration to allow selection of the desired SMILES column, make sure that only SMILES columns canbe selected by applying the type filter.- Use the RDKit Fingerprint node to create Morgan Fingerprints called mfp2 from the canonical smiles column- Create a component called Generate Fingerprints and share it in the data folder of this exercise in your local workspace (Right Click on thecomponent -> Component -> Share)3. Execute the Fille Reader with the building blocks and investigate the data. Add a RDKit Functional Group Filter node afterwards. Filter thegroups Carboxylic Acid and Aromatic Carboxylic Acid from the smiles column.4. Reuse the shared component to generate fingerprints for the filtered data II. Similarity Search1. Perform a Similarity search for the two sets of generated fingerprints from both shared components. Use the tanimoto distance,select similarity as output, and extract the 5 most similar neighbors. Use a similarity range between 0.35 and 1 and use the prefix"nearest neighbor" for the output, the id of the representative column is zinc_id2. Join the SMILES String from the filtered Zinc data to the result3. Remove the coumns mfp2 and nearest neighbor-index4. Create a Canonical SMILES for the resulting building blocks, give it an appropriate name such as bb_canonical_smiles5. Adjust the Reference Row filter to remove building blocks that were already in the input data and connect the output to the RDKitFrom Molecule node. Create a RDKit molecule column with a new name from the bb_canonical_smiles column and use this columnin the RDKit Molecule to SVG node. Investigate the output III. Final Selection and Annotation1. Create a component that allows the user to make selections and to add final annotations:- add a column called Annotation using the Constant Value Column- resort the result so that the building block and the Annotation column come first- use the Table Editor to display the result, make sure that one can edit the Annotation column, but nothingelse (Editor tab). Make sure to enable selection- use the row splitter node to split the selected data from the rest- collapse everything into a component, name it accordingly. 2. Use the view to make selections and annotations, make sure the desired result is in the output Similarity Search ExerciseIn this exercise we find similar building blocks for the interesting compounds found in the previous exercise Node 67 Column Filter RDKit Canon SMILES ReferenceRow Filter RDKit From Molecule File Reader(Complex Format) RDKit Moleculeto SVG I. Fingerprints Using Shared Components1. Read the picked_cluster_gcc.table from the data folder by dragging and dropping the file into your workspace2. Create a shared component to generate fingerprints with the same settings:- Use the node Column Selection Configuration to allow selection of the desired SMILES column, make sure that only SMILES columns canbe selected by applying the type filter.- Use the RDKit Fingerprint node to create Morgan Fingerprints called mfp2 from the canonical smiles column- Create a component called Generate Fingerprints and share it in the data folder of this exercise in your local workspace (Right Click on thecomponent -> Component -> Share)3. Execute the Fille Reader with the building blocks and investigate the data. Add a RDKit Functional Group Filter node afterwards. Filter thegroups Carboxylic Acid and Aromatic Carboxylic Acid from the smiles column.4. Reuse the shared component to generate fingerprints for the filtered data II. Similarity Search1. Perform a Similarity search for the two sets of generated fingerprints from both shared components. Use the tanimoto distance,select similarity as output, and extract the 5 most similar neighbors. Use a similarity range between 0.35 and 1 and use the prefix"nearest neighbor" for the output, the id of the representative column is zinc_id2. Join the SMILES String from the filtered Zinc data to the result3. Remove the coumns mfp2 and nearest neighbor-index4. Create a Canonical SMILES for the resulting building blocks, give it an appropriate name such as bb_canonical_smiles5. Adjust the Reference Row filter to remove building blocks that were already in the input data and connect the output to the RDKitFrom Molecule node. Create a RDKit molecule column with a new name from the bb_canonical_smiles column and use this columnin the RDKit Molecule to SVG node. Investigate the output III. Final Selection and Annotation1. Create a component that allows the user to make selections and to add final annotations:- add a column called Annotation using the Constant Value Column- resort the result so that the building block and the Annotation column come first- use the Table Editor to display the result, make sure that one can edit the Annotation column, but nothingelse (Editor tab). Make sure to enable selection- use the row splitter node to split the selected data from the rest- collapse everything into a component, name it accordingly. 2. Use the view to make selections and annotations, make sure the desired result is in the output Similarity Search ExerciseIn this exercise we find similar building blocks for the interesting compounds found in the previous exercise Node 67 Column Filter RDKit Canon SMILES ReferenceRow Filter RDKit From Molecule File Reader(Complex Format) RDKit Moleculeto SVG

Nodes

Extensions

Links