Icon

03_​Clustering

This exercise shows how to perform hierarchical clustering using molecule fingerprints and create an interactive view to pick interesting clusters. Chemical structures were extracted from this publication: https://doi.org/10.1021/acs.jmedchem.9b01658​





I. Pre-processing1. Execute the Table reader and the RDKit Canon SMILES and investigate the data2. Adjust the Row Filter node to filter for missing values3. Remove duplicates with the Duplicate Row Filter node4. Compute physchem properties using RDKit Descriptor Calculation node 5. Connect the result to the Renderer to Image node, make sure that the canonical smiles and the RdKit 2ddepiction are used as input column and renderer, respectively6. Ctrl/Cmd + Double click into the component and follow the instructions II. Clustering1. Use the RDKit Fingerprint node to create Morgan Fingerprints called mfp2 from the canonical smiles column2. The created fingerprint can now be used to calculate the Tanimoto distance in the Bit Vector Distances node3. Connect the fingerprint and the distance output to the Hierarchical Clustering (DistMatrix) node from the node repository, use average linkage as linkage type4. Use the cluster output and the table from the RDKit Fingerprint as input for the Hierarchical Cluster Assigner. Assign the clusters based on a fixed number ofclusters5. Starting from the RDKit Fingerprint node add a Column Resorter to make sure Mol is shown first followed by text and page. Display those three columns ina tile view. Adjust the tile view so that 3 tiles are displayed.6. Create a component containing the Column Resorter, the Tile View and the Hierarchical Cluster Assigner and name it "Set cluster threshold". Execute it andinspect the interactive view, select a cluster threshold.7. Connect the result of the "Set cluster threshold" component to the "Pick interesting cluster" component. Ctrl/Cmd + Double click to go into the componentand follow the instructions8. Filter for the following columns: text, canonical_smiles, and Mol. Write the result to a file in the data folder of this exercise using the table writer 03_ClusteringThis exercise shows how to perform hierarchical clustering based on molecular fingerprints and create an interactive view to pick interesting clusters.Required extensions: RDKit KNIME Integration remove missing valuesNode 54Node 55Node 56Node 57Node 58Node 59Node 60Node 64 RDKit Canon SMILES Row Filter Renderer to Image Pick compounds Pick interestingcluster Table Reader Table Reader DuplicateRow Filter RDKit DescriptorCalculation RDKit Fingerprint Bit VectorDistances Hierarchical Clustering(DistMatrix) set clusterthreshold Table Writer I. Pre-processing1. Execute the Table reader and the RDKit Canon SMILES and investigate the data2. Adjust the Row Filter node to filter for missing values3. Remove duplicates with the Duplicate Row Filter node4. Compute physchem properties using RDKit Descriptor Calculation node 5. Connect the result to the Renderer to Image node, make sure that the canonical smiles and the RdKit 2ddepiction are used as input column and renderer, respectively6. Ctrl/Cmd + Double click into the component and follow the instructions II. Clustering1. Use the RDKit Fingerprint node to create Morgan Fingerprints called mfp2 from the canonical smiles column2. The created fingerprint can now be used to calculate the Tanimoto distance in the Bit Vector Distances node3. Connect the fingerprint and the distance output to the Hierarchical Clustering (DistMatrix) node from the node repository, use average linkage as linkage type4. Use the cluster output and the table from the RDKit Fingerprint as input for the Hierarchical Cluster Assigner. Assign the clusters based on a fixed number ofclusters5. Starting from the RDKit Fingerprint node add a Column Resorter to make sure Mol is shown first followed by text and page. Display those three columns ina tile view. Adjust the tile view so that 3 tiles are displayed.6. Create a component containing the Column Resorter, the Tile View and the Hierarchical Cluster Assigner and name it "Set cluster threshold". Execute it andinspect the interactive view, select a cluster threshold.7. Connect the result of the "Set cluster threshold" component to the "Pick interesting cluster" component. Ctrl/Cmd + Double click to go into the componentand follow the instructions8. Filter for the following columns: text, canonical_smiles, and Mol. Write the result to a file in the data folder of this exercise using the table writer 03_ClusteringThis exercise shows how to perform hierarchical clustering based on molecular fingerprints and create an interactive view to pick interesting clusters.Required extensions: RDKit KNIME Integration remove missing valuesNode 54Node 55Node 56Node 57Node 58Node 59Node 60Node 64 RDKit Canon SMILES Row Filter Renderer to Image Pick compounds Pick interestingcluster Table Reader Table Reader DuplicateRow Filter RDKit DescriptorCalculation RDKit Fingerprint Bit VectorDistances Hierarchical Clustering(DistMatrix) set clusterthreshold Table Writer

Nodes

Extensions

Links