Icon

02_​Clustering

03_Clustering

This exercise shows how to perform hierarchical clustering using molecule fingerprints and create an interactive view to pick interesting clusters. Chemical structures were extracted from this publication: https://doi.org/10.1021/acs.jmedchem.9b01658​





I. Pre-processing1. Execute the Table reader and the RDKit Canon SMILES and investigate the data2. Adjust the Row Filter node to filter for missing values3. Remove duplicates with the Duplicate Row Filter node4. Compute physchem properties using RDKit Descriptor Calculation node 5. Connect the result to the RDKit from Molecule node, make sure to use the canonical smiles and generate the images with the RDKit Molecule to SVG node6. Ctrl/Cmd + Double click into the Pick compounds component and follow the instructions7. Execute the Pick compounds component and open its interactive view (Shift+F10). Explore the compounds in the dataset. Use the Parallel Coordinates Plot to pick compoundswith reasonable physchem properties (e.g. 2 < LogP < 6) II. ClusteringNote: if you don't see the node in the workflow editor, use the Node repository to find it and place it intothe workflow.1. Connect the Pick compounds compoent to the Rdkit Fingerprint node and use it to compute MorganFingerprints for the picked compounds based on their canonical smiles. Name the new column withthe fingerprints" mfp2" 2. Calculte the Tanimoto distance using the mfp2 fingerprint in the Bit Vector Distances node3. Calculate the pairwise distances using the Hierarchical Clustering (DistMatrix) node, use averagelinkage in the configuration 4. Use the cluster output and the table from the RDKit Fingerprint as input for the Hierarchical ClusterAssigner. Assign the clusters based on a fixed number of clusters (e.g. 10) 02_ClusteringThis exercise shows how to perform hierarchical clustering based on molecular fingerprints and create an interactive view to pick interesting clusters.Required extensions: RDKit KNIME Integration III. Create a component to interactively set the cluster threshold1. Add the Column Resorter node after the RDKit Fingerprint node and configure it tomake sure Mol (image of the molecule) is shown first followed by text and page. 2. Display those three columns in a tile view. Configure the Tile View node to show 3 tilesper row.3. Create a component containing the Column Resorter, the Tile View and the HierarchicalCluster Assigner and name it "Set cluster threshold". (Select these three nodes, right clickon them, select "Create Component". Call it "Set cluster threshold". 4. Execute it by pressing F7 and inspect the interactive view (Right click >> InteractiveView: Set cluster threshold), explore the cluster tree and select a cluster threshold. IV. Configure the component to pick clusters and save the results1. Connect the result of the "Set cluster threshold" component to the "Pickinteresting cluster" component. Ctrl/Cmd + Double click to go into thecomponent and follow the instructions2. Execute the Pick interesting cluster component and open its interactiveview. Explore it and select the cluster(s) to save.3. Filter for the following columns: text, canonical_smiles, and Mol with theColumn Filter node. Write the result to a file in the data folder of this exerciseusing the table writer Node 54 HierarchicalCluster Assigner RDKit Canon SMILES RDKit Moleculeto SVG Pick compounds Pick interestingcluster Table Reader RDKit From Molecule Row Filter I. Pre-processing1. Execute the Table reader and the RDKit Canon SMILES and investigate the data2. Adjust the Row Filter node to filter for missing values3. Remove duplicates with the Duplicate Row Filter node4. Compute physchem properties using RDKit Descriptor Calculation node 5. Connect the result to the RDKit from Molecule node, make sure to use the canonical smiles and generate the images with the RDKit Molecule to SVG node6. Ctrl/Cmd + Double click into the Pick compounds component and follow the instructions7. Execute the Pick compounds component and open its interactive view (Shift+F10). Explore the compounds in the dataset. Use the Parallel Coordinates Plot to pick compoundswith reasonable physchem properties (e.g. 2 < LogP < 6) II. ClusteringNote: if you don't see the node in the workflow editor, use the Node repository to find it and place it intothe workflow.1. Connect the Pick compounds compoent to the Rdkit Fingerprint node and use it to compute MorganFingerprints for the picked compounds based on their canonical smiles. Name the new column withthe fingerprints" mfp2" 2. Calculte the Tanimoto distance using the mfp2 fingerprint in the Bit Vector Distances node3. Calculate the pairwise distances using the Hierarchical Clustering (DistMatrix) node, use averagelinkage in the configuration 4. Use the cluster output and the table from the RDKit Fingerprint as input for the Hierarchical ClusterAssigner. Assign the clusters based on a fixed number of clusters (e.g. 10) 02_ClusteringThis exercise shows how to perform hierarchical clustering based on molecular fingerprints and create an interactive view to pick interesting clusters.Required extensions: RDKit KNIME Integration III. Create a component to interactively set the cluster threshold1. Add the Column Resorter node after the RDKit Fingerprint node and configure it tomake sure Mol (image of the molecule) is shown first followed by text and page. 2. Display those three columns in a tile view. Configure the Tile View node to show 3 tilesper row.3. Create a component containing the Column Resorter, the Tile View and the HierarchicalCluster Assigner and name it "Set cluster threshold". (Select these three nodes, right clickon them, select "Create Component". Call it "Set cluster threshold". 4. Execute it by pressing F7 and inspect the interactive view (Right click >> InteractiveView: Set cluster threshold), explore the cluster tree and select a cluster threshold. IV. Configure the component to pick clusters and save the results1. Connect the result of the "Set cluster threshold" component to the "Pickinteresting cluster" component. Ctrl/Cmd + Double click to go into thecomponent and follow the instructions2. Execute the Pick interesting cluster component and open its interactiveview. Explore it and select the cluster(s) to save.3. Filter for the following columns: text, canonical_smiles, and Mol with theColumn Filter node. Write the result to a file in the data folder of this exerciseusing the table writer Node 54HierarchicalCluster Assigner RDKit Canon SMILES RDKit Moleculeto SVG Pick compounds Pick interestingcluster Table Reader RDKit From Molecule Row Filter

Nodes

Extensions

Links