Icon

04_​SimilaritySearch_​solution

04_SimilaritySearch

In this exercise we expand our results by finding building blocks similar to those we selected in the previous exercise (03_Clustering).

The catalog of building blocks is taken from : https://zinc15.docking.org/catalogs/enaminebbe/substances/

We will first calculate the fingerprints from the compounds and the building blocks. To ensure that the settings for this calculation are the same we will create and use a shared component. Then we perform a similarity search and end with a final selection and annotation step.

04_Similarity SearchIn this exercise we expand our results by finding similar building blocks for the interesting compounds found in the previous exercise. I. Fingerprints Using Shared ComponentsDisregard the already provided File Reader for now and read the clusters you picked in the last exerciseby dragging and dropping the file into your workspace. If it is in .table format, no further specification of filetypes is necessary. Otherwise, set file type of the "smiles" column to SMILES.Create a shared component to generate fingerprints with the same settings:- Use the node Column Selection Configuration to allow selection of the desired SMILES column, makesure that only SMILES columns can be selected by applying the type filter. You will later be able to accessthe configuration by double-clinking the component. - Use the RDKit Fingerprint node to create Morgan Fingerprints from the canonical smiles column, callthe column "mfp2".- Create a component called "Generate Fingerprints" and share it in the data folder of this exercise in yourlocal workspace (Right Click on the component → Component → Share). Select workflow-relative path inthe next window.Execute the File Reader (Complex Format) with the building blocks and investigate the data. Add a RDKitFunctional Group Filter node afterwards. Filter the groups Carboxylic Acid and Aromatic Carboxylic Acidfrom the smiles column.Reuse the shared component to generate fingerprints for the filtered data. II. Similarity SearchPerform a Similarity search for the two sets of generated fingerprints from both sharedcomponents. Use Tanimoto distance, select similarity as output, and extract the 5 most similarneighbors. Use a similarity range between 0.35 and 1 and use the prefix "nearest neighbor" for theoutput, the id of the representative column is zinc_id.Join the SMILES String from the filtered Zinc data to the result.Remove the columns mfp2 and nearest neighbor-index.Create a Canonical SMILES for the resulting building blocks, give it an appropriate name such asbb_canonical_smiles.Adjust the Reference Row filter to remove building blocks that were already in the input data andconnect the output to the RDKit From Molecule node. Create a RDKit molecule column with a newname from the bb_canonical_smiles column and use this column in the Renderer to Image node.Choose RDKit 2D depiction as renderer and svg as image type. Investigate the output. III. Final Selection and AnnotationCreate a component that allows the user to make selections and to add finalannotations:- add a column called Annotation using the Constant Value Column- re-sort the result so that the building block and the Annotation column comefirst- use the Table Editor (JavaScript) to display the result, make sure that one canedit the Annotation column, but nothing else (Editor tab). Make sure to enableselection- use the Row Splitter node to split the selected data from the rest- collapse everything into a component, name it accordinglyUse the view to make selections and annotations, make sure the desired resultis in the output Node 45Node 49Node 51Node 52Node 60Node 65Node 66Node 67Node 69Node 70 Generate fingerprintssolution Similarity Search Column Filter RDKit Canon SMILES ReferenceRow Filter Renderer to Image Final selectionand annotation RDKit From Molecule Table Reader File Reader(Complex Format) Joiner RDKit FunctionalGroup Filter Generate fingerprintssolution 04_Similarity SearchIn this exercise we expand our results by finding similar building blocks for the interesting compounds found in the previous exercise. I. Fingerprints Using Shared ComponentsDisregard the already provided File Reader for now and read the clusters you picked in the last exerciseby dragging and dropping the file into your workspace. If it is in .table format, no further specification of filetypes is necessary. Otherwise, set file type of the "smiles" column to SMILES.Create a shared component to generate fingerprints with the same settings:- Use the node Column Selection Configuration to allow selection of the desired SMILES column, makesure that only SMILES columns can be selected by applying the type filter. You will later be able to accessthe configuration by double-clinking the component. - Use the RDKit Fingerprint node to create Morgan Fingerprints from the canonical smiles column, callthe column "mfp2".- Create a component called "Generate Fingerprints" and share it in the data folder of this exercise in yourlocal workspace (Right Click on the component → Component → Share). Select workflow-relative path inthe next window.Execute the File Reader (Complex Format) with the building blocks and investigate the data. Add a RDKitFunctional Group Filter node afterwards. Filter the groups Carboxylic Acid and Aromatic Carboxylic Acidfrom the smiles column.Reuse the shared component to generate fingerprints for the filtered data. II. Similarity SearchPerform a Similarity search for the two sets of generated fingerprints from both sharedcomponents. Use Tanimoto distance, select similarity as output, and extract the 5 most similarneighbors. Use a similarity range between 0.35 and 1 and use the prefix "nearest neighbor" for theoutput, the id of the representative column is zinc_id.Join the SMILES String from the filtered Zinc data to the result.Remove the columns mfp2 and nearest neighbor-index.Create a Canonical SMILES for the resulting building blocks, give it an appropriate name such asbb_canonical_smiles.Adjust the Reference Row filter to remove building blocks that were already in the input data andconnect the output to the RDKit From Molecule node. Create a RDKit molecule column with a newname from the bb_canonical_smiles column and use this column in the Renderer to Image node.Choose RDKit 2D depiction as renderer and svg as image type. Investigate the output. III. Final Selection and AnnotationCreate a component that allows the user to make selections and to add finalannotations:- add a column called Annotation using the Constant Value Column- re-sort the result so that the building block and the Annotation column comefirst- use the Table Editor (JavaScript) to display the result, make sure that one canedit the Annotation column, but nothing else (Editor tab). Make sure to enableselection- use the Row Splitter node to split the selected data from the rest- collapse everything into a component, name it accordinglyUse the view to make selections and annotations, make sure the desired resultis in the output Node 45Node 49Node 51Node 52Node 60Node 65Node 66Node 67Node 69Node 70 Generate fingerprintssolution Similarity Search Column Filter RDKit Canon SMILES ReferenceRow Filter Renderer to Image Final selectionand annotation RDKit From Molecule Table Reader File Reader(Complex Format) Joiner RDKit FunctionalGroup Filter Generate fingerprintssolution

Nodes

Extensions

Links