Icon

Model Selection with Integrated Deployment

This workflow deploys an advanced parameter optimzation protocol with four machine learning methods. In this implementation the choice of features (fingerprints) and one hyperparameter per method are being optimized. However, we encourage to use this workflow as a template if you have completely different data and customize it by including additional parameters into the optimization loop.

Parameter optimization is performed on 80% of the original dataset. The optimization loops are encapsulated in Metanodes which carry the name of the machine learning methods. The model performances can be evaluted and the best model can be selected in the interactive view of the Pick best Model component. Finally, the selected model is scored using 20% of the dataset (that was not part of optimization cycle) and results are displayed with Model Report component.

The dataset represents a subset of 844 compounds evaluated for activity against CDPK1. 181 compounds inhibited CDPK1 with IC50 below 1uM and have "active" as their class.
More information is available https://chembl.gitbook.io/chembl-ntd/#deposited-set-19-5th-march-2016-uw-kinase-screening-hits. See Set 19.

This workflow a revised version of the original workflow: https://kni.me/w/-ATVMu9EmIURm8kr

Model Selection with Integrated DeploymentThis workflow deploys an advanced parameter optimzation protocol with four machine learning methods. The model performances can be compared in the interactive view of the "Pick best Model" component. Based on the selection in the interactive view, the selected model is build and automaticallydeployed. Additionally, a model report is generated. Read data Data preprocessing and partition dataSelect the column with class values and partition the data into atraining set and a test set Parameter optimization Performed for each method separately Pick and deploy the final model Score the model with test data and presenta report 80/20random stratifiedaction needed:investigate model performances and select best modelaction needed:- pick activity column- pick objective functionCDPK1.table Partitioning XGBoost Random Forest Naive Bayes Logistic Regression Build Model GenerateFingerprints Pick best Model Model Report Pick activitycolumn Table Reader Model Selection with Integrated DeploymentThis workflow deploys an advanced parameter optimzation protocol with four machine learning methods. The model performances can be compared in the interactive view of the "Pick best Model" component. Based on the selection in the interactive view, the selected model is build and automaticallydeployed. Additionally, a model report is generated. Read data Data preprocessing and partition dataSelect the column with class values and partition the data into atraining set and a test set Parameter optimization Performed for each method separately Pick and deploy the final model Score the model with test data and presenta report 80/20random stratifiedaction needed:investigate model performances and select best modelaction needed:- pick activity column- pick objective functionCDPK1.table Partitioning XGBoost Random Forest Naive Bayes Logistic Regression Build Model GenerateFingerprints Pick best Model Model Report Pick activitycolumn Table Reader

Nodes

Extensions

Links