Icon

TeachOpenCADD_​Workflow7_​Machine_​learning

TeachOpenCADD Workflow 7: Ligand-based screening: Machine learning

With the continuously increasing amount of available data, machine learning (ML) gained momentum in drug discovery and especially in ligand-based virtual screening (VS) to predict the activity of novel compounds against a target of interest.
In this workflow, different ML models are trained on the filtered ChEMBL dataset to discriminate between active and inactive compounds with respect to a protein target.

Step 3Train ML model using k-cross validation (default 10-fold) 7. Ligand-based screening: Machine learningWith the continuously increasing amount of available data, machine learning (ML) gained momentum in drugdiscovery and especially in ligand-based virtual screening (VS) to predict the activity of novel compoundsagainst a target of interest. In the following, different ML models are trained on the filtered ChEMBL datasetto discriminate between active and inactive compounds with respect to a protein target. Step 1Split dataset intoactive & inactivecompounds (pIC50 cut-off = 6.3) Random forest classifier Artificial neural network classifier Support vector machine classifier Step 3Evaluate models with ROC curves Step 2Generate fingerprints and prepare data for ML This workflow adapts the KNIME workflow example 0x2_Machine_Learning (Daria Goldmann, KNIMEIntroduction and Training Session at Volkamer Lab in Berlin on 2019-01-21). This workflow is part of the TeachOpenCADD pipeline: https://hub.knime.com/volkamerlab/space/TeachOpenCADDRead more on the theoretical background of this workflow on our TeachOpenCADD platform: https://projects.volkamerlab.org/teachopencadd/talktorials/T007_compound_activity_machine_learning.html Generate fingerprint(default MACCS)Split data into training/test setin k-fold validationAggregate results from k-fold validationTrain modelon training setTest modelon test setAdd boolean activity columnSupport vectorExtract columnsneeded for ML nodesSplit fingerprint to one bit per columnNeural networkRandom forestConvert activity to stringROC curveScore viewList of compoundsX-Aggregator X-Partitioner MultiLayerPerceptronPredictor RProp MLP Learner RDKit Fingerprint X-Partitioner X-Aggregator Random ForestLearner Random ForestPredictor Math Formula Evaluate model X-Partitioner X-Aggregator SVM Learner SVM Predictor Column Filter Expand Bit Vector Molecule Type Cast RDKit From Molecule Evaluate model Evaluate model Number To String Evaluate model Evaluate model CSV Reader Step 3Train ML model using k-cross validation (default 10-fold) 7. Ligand-based screening: Machine learningWith the continuously increasing amount of available data, machine learning (ML) gained momentum in drugdiscovery and especially in ligand-based virtual screening (VS) to predict the activity of novel compoundsagainst a target of interest. In the following, different ML models are trained on the filtered ChEMBL datasetto discriminate between active and inactive compounds with respect to a protein target. Step 1Split dataset intoactive & inactivecompounds (pIC50 cut-off = 6.3) Random forest classifier Artificial neural network classifier Support vector machine classifier Step 3Evaluate models with ROC curves Step 2Generate fingerprints and prepare data for ML This workflow adapts the KNIME workflow example 0x2_Machine_Learning (Daria Goldmann, KNIMEIntroduction and Training Session at Volkamer Lab in Berlin on 2019-01-21). This workflow is part of the TeachOpenCADD pipeline: https://hub.knime.com/volkamerlab/space/TeachOpenCADDRead more on the theoretical background of this workflow on our TeachOpenCADD platform: https://projects.volkamerlab.org/teachopencadd/talktorials/T007_compound_activity_machine_learning.html Generate fingerprint(default MACCS)Split data into training/test setin k-fold validationAggregate results from k-fold validationTrain modelon training setTest modelon test setAdd boolean activity columnSupport vectorExtract columnsneeded for ML nodesSplit fingerprint to one bit per columnNeural networkRandom forestConvert activity to stringROC curveScore viewList of compoundsX-Aggregator X-Partitioner MultiLayerPerceptronPredictor RProp MLP Learner RDKit Fingerprint X-Partitioner X-Aggregator Random ForestLearner Random ForestPredictor Math Formula Evaluate model X-Partitioner X-Aggregator SVM Learner SVM Predictor Column Filter Expand Bit Vector Molecule Type Cast RDKit From Molecule Evaluate model Evaluate model Number To String Evaluate model Evaluate model CSV Reader

Nodes

Extensions

Links