Icon

Machine Learning Chemistry

This workflow snippet demonstrates how to train a bioactivity model using chemical structures. From the chemical structure we generate hashed bit-based fingerprints. Those fingerprints serve as an input for the Random Forest model. The model is trained on a part of the data set (training data set). For the remaining data (test data set) the model is applied and the predictions are evaluated using the ROC Curve node and the Scorer node in a composite view.

The dataset represents a subset of 844 compounds evaluated for activity against CDPK1. 181 compounds inhibited CDPK1 with IC50 below 1uM and have "active" as their class.
More information is available https://chembl.gitbook.io/chembl-ntd/#deposited-set-19-5th-march-2016-uw-kinase-screening-hits. See Set 19.

This workflow snippet demonstrates how to train a bioactivity model using chemical structures. RDKit fingerprintCDPK1.tabletrain modeluse model to predict on test datatop: training databottom: test datasave trained model RDKit Fingerprint Table Reader Random ForestLearner Random ForestPredictor Partitioning Model Writer Evaluate Model This workflow snippet demonstrates how to train a bioactivity model using chemical structures. RDKit fingerprintCDPK1.tabletrain modeluse model to predict on test datatop: training databottom: test datasave trained modelRDKit Fingerprint Table Reader Random ForestLearner Random ForestPredictor Partitioning Model Writer Evaluate Model

Nodes

Extensions

Links