Machine Learning Chemistry

This workflow snippet demonstrates how to train a bioactivity model using chemical structures. From the chemical structure we generate hashed bit-based fingerprints. Those fingerprints serve as an input for the Random Forest model. The model is trained on a part of the data set (training data set). For the remaining data (test data set) the model is applied and the predictions are evaluated using the ROC Curve node and the Scorer node in a composite view.

The dataset represents a subset of 844 compounds evaluated for activity against CDPK1. 181 compounds inhibited CDPK1 with IC50 below 1uM and have "active" as their class.
More information is available https://chembl.gitbook.io/chembl-ntd/#deposited-set-19-5th-march-2016-uw-kinase-screening-hits. See Set 19.

Nodes

Component Input1 ×
Component Output1 ×
Model Writer1 ×
Partitioning1 ×
RDKit Fingerprint1 ×
ROC Curve (JavaScript)1 ×
Random Forest Learner1 ×
Random Forest Predictor1 ×
Scorer (JavaScript)1 ×
Table Reader1 ×

Extensions

FeatureRDKit Nodes Feature

Machine Learning Chemistry

Nodes

Extensions

Links

Download