

TeachOpenCADD Workflow 5: Compound clustering

Clustering can be used to identify groups of similar compounds, in order to pick a set of diverse compounds from these clusters for e.g. non-redundant experimental testing.
This workflow shows how to perform such a clustering based on a hierarchical clustering algorithm.

Step 1Cluster dataset with hierarchical clustering algorithm Cluster sizes Note that hierarchical clustering is used here (instead of Butina like in the Jupyter notebook T4 on theTeachOpenCADD platform) since a Butina clustering KNIME node is not available. Careful! Hierarchical clustering is here time consuming. *To skip the pre-clustering step, clustered compounds can be loaded using the "Table Reader" node.Connections leaving the "Hierarchical Cluster Assigner" node must be removed and instead the "TableReader" node must be connected to the "GroupBy", "Joiner" and "Compound Picker" nodes.This workflow adapts the KNIME workflow example 99_Community/03_RDKit/01_Clustering (KNIMEEXAMPLES Server, accessed: 2019-05-24). Step 2.1Get largest cluster (used in workflow 6) Step 2.2Pick diverse subset based on clusters 5. Compound clusteringClustering can be used to identify groups of similar compounds, in order to pick a set of diversecompounds from these clusters for e.g. non-redundant experimental testing. The following steps showhow to perform such a clustering based on a hierarchical clustering algorithm. This workflow is part of the TeachOpenCADD pipeline: https://hub.knime.com/volkamerlab/space/TeachOpenCADDRead more on the theoretical background of this workflow:https://projects.volkamerlab.org/teachopencadd/talktorials/T005_compound_clustering.html GenerateMorgan fingerprintAverage linkageclusteringCalculateTanimoto distanceAssign compoundsto clustersPerform clustersize statisticsPick a list of 1000 compounds as a diverse subsetDiverse subset based on clustersList of compoundsNode 279Save clustered compoundsLoad pre-clustered compounds*Node 284RDKit Fingerprint Hierarchical Clustering(DistMatrix) Distance MatrixCalculate Hierarchical ClusterAssigner (local) GroupBy Sorter Select largestcluster Compound Picker SDF Writer Column Filter Molecule Type Cast RDKit From Molecule Number To String Line Plot CSV Writer CSV Reader Column Filter Table Writer Table Reader Joiner Column Merger Step 1Cluster dataset with hierarchical clustering algorithm Cluster sizes Note that hierarchical clustering is used here (instead of Butina like in the Jupyter notebook T4 on theTeachOpenCADD platform) since a Butina clustering KNIME node is not available. Careful! Hierarchical clustering is here time consuming. *To skip the pre-clustering step, clustered compounds can be loaded using the "Table Reader" node.Connections leaving the "Hierarchical Cluster Assigner" node must be removed and instead the "TableReader" node must be connected to the "GroupBy", "Joiner" and "Compound Picker" nodes.This workflow adapts the KNIME workflow example 99_Community/03_RDKit/01_Clustering (KNIMEEXAMPLES Server, accessed: 2019-05-24). Step 2.1Get largest cluster (used in workflow 6) Step 2.2Pick diverse subset based on clusters 5. Compound clusteringClustering can be used to identify groups of similar compounds, in order to pick a set of diversecompounds from these clusters for e.g. non-redundant experimental testing. The following steps showhow to perform such a clustering based on a hierarchical clustering algorithm. This workflow is part of the TeachOpenCADD pipeline: https://hub.knime.com/volkamerlab/space/TeachOpenCADDRead more on the theoretical background of this workflow:https://projects.volkamerlab.org/teachopencadd/talktorials/T005_compound_clustering.html GenerateMorgan fingerprintAverage linkageclusteringCalculateTanimoto distanceAssign compoundsto clustersPerform clustersize statisticsPick a list of 1000 compounds as a diverse subsetDiverse subset based on clustersList of compoundsNode 279Save clustered compoundsLoad pre-clustered compounds*Node 284RDKit Fingerprint Hierarchical Clustering(DistMatrix) Distance MatrixCalculate Hierarchical ClusterAssigner (local) GroupBy Sorter Select largestcluster Compound Picker SDF Writer Column Filter Molecule Type Cast RDKit From Molecule Number To String Line Plot CSV Writer CSV Reader Column Filter Table Writer Table Reader Joiner Column Merger


