Icon

02_​Hyperparameter Optimization_​Bonus

Model Selection with Integrated Deployment

This workflow deploys an advanced parameter optimzation protocol with four machine learning methods. In this implementation the choice of features (fingerprints) and one hyperparameter per method are being optimized. However, we encourage to use this workflow as a template if you have completely different data and customize it by including additional parameters into the optimization loop.

Parameter optimization is performed on 80% of the original dataset. The optimization loops are encapsulated in Metanodes which carry the name of the machine learning methods. The model performances can be evaluted and the best model can be selected in the interactive view of the Pick best Model component. Finally, the selected model is scored using 20% of the dataset (that was not part of optimization cycle) and results are displayed with Model Report component.

The dataset represents a subset of 844 compounds evaluated for activity against CDPK1. 181 compounds inhibited CDPK1 with IC50 below 1uM and have "active" as their class.
More information is available https://chembl.gitbook.io/chembl-ntd/#deposited-set-19-5th-march-2016-uw-kinase-screening-hits. See Set 19.

This workflow a revised version of the original workflow: https://kni.me/w/-ATVMu9EmIURm8kr

1. Read data 2. Data preprocessing and partition dataSelect the column with class values and partitionthe data into a training set and a test set 3. Parameter optimization Performed for each methodseparately 4. Pick and deploy the final model 5 Score the model with test dataand present a report 02_Hyperparameter Optimization_BonusThis workflow deploys an advanced parameter optimization protocol with four machine learning methods. In this implementation the choice of features (fingerprints) and onehyperparameter per method are being optimized. However, we encourage to use this workflow as a template if you have completely different data and customize it by includingadditional parameters into the optimization loop.Parameter optimization is performed on 80% of the original dataset. The optimization loops are encapsulated in grey Wrapped Metanodes which carry the name of the machinelearning methods. Parameters leading to the highest enrichment factor on 5% of the data set are picked to build the best model. Finally, this model is scored using 20% of the dataset(that was not part of optimization cycle) and results are displayed with Model Report Wrapped Metanode. 80/20random stratified action needed:investigate model performances and select best modelaction needed:- pick activity column- pick objective function Partitioning XGBoost Random Forest Naive Bayes Logistic Regression Build Model Pick best Model Model Report Pick activitycolumn Table Reader 1. Read data 2. Data preprocessing and partition dataSelect the column with class values and partitionthe data into a training set and a test set 3. Parameter optimization Performed for each methodseparately 4. Pick and deploy the final model 5 Score the model with test dataand present a report 02_Hyperparameter Optimization_BonusThis workflow deploys an advanced parameter optimization protocol with four machine learning methods. In this implementation the choice of features (fingerprints) and onehyperparameter per method are being optimized. However, we encourage to use this workflow as a template if you have completely different data and customize it by includingadditional parameters into the optimization loop.Parameter optimization is performed on 80% of the original dataset. The optimization loops are encapsulated in grey Wrapped Metanodes which carry the name of the machinelearning methods. Parameters leading to the highest enrichment factor on 5% of the data set are picked to build the best model. Finally, this model is scored using 20% of the dataset(that was not part of optimization cycle) and results are displayed with Model Report Wrapped Metanode. 80/20random stratified action needed:investigate model performances and select best modelaction needed:- pick activity column- pick objective function Partitioning XGBoost Random Forest Naive Bayes Logistic Regression Build Model Pick best Model Model Report Pick activitycolumn Table Reader

Nodes

Extensions

Links