0 ×

TeachOpenCADD_​Workflow5_​Compound_​clustering

Workflow

TeachOpenCADD Workflow 5: Compound clustering
Clustering can be used to identify groups of similar compounds, in order to pick a set of diverse compounds from these clusters for e.g. non-redundant experimental testing. This workflow shows how to perform such a clustering based on a hierarchical clustering algorithm.
Hierarchical clusteringDiverse compound set
5. Compound clusteringClustering can be used to identify groups of similar compounds, in order to pick a set of diverse compoundsfrom these clusters for e.g. non-redundant experimental testing. The following steps show how to performsuch a clustering based on a hierarchical clustering algorithm. Step 1Cluster dataset with hierarchical clustering algorithm Step 2.1Get largest cluster (used in workflow 6) Note that hierarchical clustering is used here (instead of Butina like in the Jupyter notebook T4 on theTeachOpenCADD platform) since a Butina clustering KNIME node is not available. Careful! Hierarchical clustering is here time consuming. *To skip the pre-clustering step, clustered compounds can be loaded using the "Table Reader" node.Connections leaving the "Hierarchical Cluster Assigner" node must be removed and instead the "TableReader" node must be connected to the "GroupBy", "Joiner" and "Compound Picker" nodes.This workflow adapts the KNIME workflow example 99_Community/03_RDKit/01_Clustering (KNIMEEXAMPLES Server, accessed: 2019-05-24). Step 2.2Pick diverse subset based on clusters Cluster sizes This workflow is part of the TeachOpenCADD pipeline: https://hub.knime.com/volkamerlab/space/TeachOpenCADDRead more on the theoretical background of this workflow:https://github.com/volkamerlab/TeachOpenCADD/blob/master/talktorials/5_compound_clustering/T5_compound_clustering.ipynb GenerateMorgan fingerprintAverage linkageclusteringCalculateTanimoto distanceAssign compoundsto clustersPerform clustersize statisticsPick a list of 1000 compounds as a diverse subsetDiverse subset based on clustersSave clustered compoundsLoad pre-clustered compounds*List of compounds RDKit Fingerprint Hierarchical Clustering(DistMatrix) Distance MatrixCalculate HierarchicalCluster Assigner GroupBy Sorter Joiner Select largestcluster Compound Picker SDF Writer Column Filter Table Writer Table Reader CSV Reader Molecule Type Cast RDKit From Molecule CSV Writer Number To String Line Plot 5. Compound clusteringClustering can be used to identify groups of similar compounds, in order to pick a set of diverse compoundsfrom these clusters for e.g. non-redundant experimental testing. The following steps show how to performsuch a clustering based on a hierarchical clustering algorithm. Step 1Cluster dataset with hierarchical clustering algorithm Step 2.1Get largest cluster (used in workflow 6) Note that hierarchical clustering is used here (instead of Butina like in the Jupyter notebook T4 on theTeachOpenCADD platform) since a Butina clustering KNIME node is not available. Careful! Hierarchical clustering is here time consuming. *To skip the pre-clustering step, clustered compounds can be loaded using the "Table Reader" node.Connections leaving the "Hierarchical Cluster Assigner" node must be removed and instead the "TableReader" node must be connected to the "GroupBy", "Joiner" and "Compound Picker" nodes.This workflow adapts the KNIME workflow example 99_Community/03_RDKit/01_Clustering (KNIMEEXAMPLES Server, accessed: 2019-05-24). Step 2.2Pick diverse subset based on clusters Cluster sizes This workflow is part of the TeachOpenCADD pipeline: https://hub.knime.com/volkamerlab/space/TeachOpenCADDRead more on the theoretical background of this workflow:https://github.com/volkamerlab/TeachOpenCADD/blob/master/talktorials/5_compound_clustering/T5_compound_clustering.ipynb GenerateMorgan fingerprintAverage linkageclusteringCalculateTanimoto distanceAssign compoundsto clustersPerform clustersize statisticsPick a list of 1000 compounds as a diverse subsetDiverse subset based on clustersSave clustered compoundsLoad pre-clustered compounds*List of compounds RDKit Fingerprint Hierarchical Clustering(DistMatrix) Distance MatrixCalculate HierarchicalCluster Assigner GroupBy Sorter Joiner Select largestcluster Compound Picker SDF Writer Column Filter Table Writer Table Reader CSV Reader Molecule Type Cast RDKit From Molecule CSV Writer Number To String Line Plot

Download

Get this workflow from the following link: Download

Nodes

TeachOpenCADD_​Workflow5_​Compound_​clustering consists of the following 27 nodes(s):

Plugins

TeachOpenCADD_​Workflow5_​Compound_​clustering contains nodes provided by the following 5 plugin(s):