Icon

05 Clustering

03_Clustering

This exercise shows how to perform hierarchical clustering using molecule fingerprints and create an interactive view to pick interesting clusters. Chemical structures were extracted from this publication: https://doi.org/10.1021/acs.jmedchem.9b01658​





I. Pre-processing1. Execute the Table reader and the RDKit Canon SMILES and investigate the data.2. Adjust the Row Filter node to filter for missing values.3. Remove duplicate canonical SMILES with the Duplicate Row Filter node.4. Compute physchem properties using RDKit Descriptor Calculation node. Let it calculate: SLogP, PSA, AMW, NumRotatableBonds, NumHBD, NumHBA5. Connect the result to the Renderer to Image node, make sure that the canonical smiles and the RdKit 2ddepiction are used as input column and renderer, respectively.6. Ctrl/Cmd + Double click into the component and follow the instructions. II. Clustering1. Use the RDKit Fingerprint node to create Morgan Fingerprints from the canonical smiles column; call the column "mfp2".2. The created fingerprint can now be used to calculate the Tanimoto distance in the Bit Vector Distances node3. Connect the fingerprint and the distance output to the Hierarchical Clustering (DistMatrix) node from the node repository, use average linkage as linkage type4. Use the cluster output and the table from the RDKit Fingerprint as input for the Hierarchical Cluster Assigner. Assign the clusters based on a fixed number ofclusters5. Starting from the RDKit Fingerprint node add a Column Resorter to make sure Mol is shown first followed by text and page. Display those three columns ina tile view. Adjust the tile view so that 3 tiles per row are displayed. Under the tab "Interactivity", set it to display selected items only. 6. Create a component containing the Column Resorter, the Tile View and the Hierarchical Cluster Assigner and name it "Set cluster threshold". Execute it andinspect the interactive view, select a cluster threshold.7. Connect the result of the "Set cluster threshold" component to the "Pick interesting cluster" component. Ctrl/Cmd + Double click to go into the componentand follow the instructions8. Filter for the following columns: text, canonical_smiles, and Mol. Write the result to a file in the data folder of this exercise using the table writer 03_ClusteringThis exercise shows how to perform hierarchical clustering based on molecular fingerprints and create an interactive view to pick interesting clusters.Required extensions: RDKit KNIME Integration canonicalizeremove missing valuesrender moleculesfor viewspick compoundsfrom viewsimport data HierarchicalCluster Assigner RDKit Canon SMILES Row Filter Renderer to Image Pick compounds Pick interestingcluster Table Reader I. Pre-processing1. Execute the Table reader and the RDKit Canon SMILES and investigate the data.2. Adjust the Row Filter node to filter for missing values.3. Remove duplicate canonical SMILES with the Duplicate Row Filter node.4. Compute physchem properties using RDKit Descriptor Calculation node. Let it calculate: SLogP, PSA, AMW, NumRotatableBonds, NumHBD, NumHBA5. Connect the result to the Renderer to Image node, make sure that the canonical smiles and the RdKit 2ddepiction are used as input column and renderer, respectively.6. Ctrl/Cmd + Double click into the component and follow the instructions. II. Clustering1. Use the RDKit Fingerprint node to create Morgan Fingerprints from the canonical smiles column; call the column "mfp2".2. The created fingerprint can now be used to calculate the Tanimoto distance in the Bit Vector Distances node3. Connect the fingerprint and the distance output to the Hierarchical Clustering (DistMatrix) node from the node repository, use average linkage as linkage type4. Use the cluster output and the table from the RDKit Fingerprint as input for the Hierarchical Cluster Assigner. Assign the clusters based on a fixed number ofclusters5. Starting from the RDKit Fingerprint node add a Column Resorter to make sure Mol is shown first followed by text and page. Display those three columns ina tile view. Adjust the tile view so that 3 tiles per row are displayed. Under the tab "Interactivity", set it to display selected items only. 6. Create a component containing the Column Resorter, the Tile View and the Hierarchical Cluster Assigner and name it "Set cluster threshold". Execute it andinspect the interactive view, select a cluster threshold.7. Connect the result of the "Set cluster threshold" component to the "Pick interesting cluster" component. Ctrl/Cmd + Double click to go into the componentand follow the instructions8. Filter for the following columns: text, canonical_smiles, and Mol. Write the result to a file in the data folder of this exercise using the table writer 03_ClusteringThis exercise shows how to perform hierarchical clustering based on molecular fingerprints and create an interactive view to pick interesting clusters.Required extensions: RDKit KNIME Integration canonicalizeremove missing valuesrender moleculesfor viewspick compoundsfrom viewsimport data HierarchicalCluster Assigner RDKit Canon SMILES Row Filter Renderer to Image Pick compounds Pick interestingcluster Table Reader

Nodes

Extensions

Links