Icon

08_​Model_​Optimization_​and_​Selection

Model Optimization and Selection

This workflow deploys an advanced parameter optimzation protocol with four machine learning methods. In this implementation the choice of features and one hyperparameter per method are being optimized.
The dataset represents a subset of 844 compounds evaluated for activity against CDPK1. More information on the set is available https://www.ebi.ac.uk/chemblntd/#tcams_dataset. See Set 19.

Model Optimization and SelectionThis workflow deploys an advanced parameter optimzation protocol with four machine learning methods. In this implementation the choice of features (fingerprints) and one hyperparameter per method are being optimized. However, we encourage to use thisworkflow as a template if you have completely different data and customize it by including additional parameters into the optimization loop.Parameter optimization is performed on 80% of the original dataset. The optimization loops are encapsulated in grey Wrapped Metanodes which carry the name of the machine learning methods. Parameters leading to the highest enrichment factor on 5% ofthe data set are picked to build the best model. Finally, this model is scored using 20% of the dataset (that was not part of optimization cycle) and results are displayed with Model Report Wrapped Metanode.The dataset represents a subset of 844 compounds evaluated for activity against CDPK1. 181 compounds inhibited CDPK1 with IC50 below 1uM and have "active" as their class.More information is available https://chembl.gitbook.io/chembl-ntd/ See Set 19. Read Data Assign Classes and Partition DataSelect the column with class values andpartition the data into a training set and atest set Parameter Optimization Performed for each methodseparately Pick the Best Parameters and Build a Final Model Score the Model with Test Data and Present aReport 80/20random stratifieddata.table Partitioning H2O GradientBoosting Pick the BestParameters Pick activitycolumn Random Forest Naive Bayes Logistic Regression Build Model withSelected Parameters Model Report Score Model withValidation Data Table Reader Model Optimization and SelectionThis workflow deploys an advanced parameter optimzation protocol with four machine learning methods. In this implementation the choice of features (fingerprints) and one hyperparameter per method are being optimized. However, we encourage to use thisworkflow as a template if you have completely different data and customize it by including additional parameters into the optimization loop.Parameter optimization is performed on 80% of the original dataset. The optimization loops are encapsulated in grey Wrapped Metanodes which carry the name of the machine learning methods. Parameters leading to the highest enrichment factor on 5% ofthe data set are picked to build the best model. Finally, this model is scored using 20% of the dataset (that was not part of optimization cycle) and results are displayed with Model Report Wrapped Metanode.The dataset represents a subset of 844 compounds evaluated for activity against CDPK1. 181 compounds inhibited CDPK1 with IC50 below 1uM and have "active" as their class.More information is available https://chembl.gitbook.io/chembl-ntd/ See Set 19. Read Data Assign Classes and Partition DataSelect the column with class values andpartition the data into a training set and atest set Parameter Optimization Performed for each methodseparately Pick the Best Parameters and Build a Final Model Score the Model with Test Data and Present aReport 80/20random stratifieddata.tablePartitioning H2O GradientBoosting Pick the BestParameters Pick activitycolumn Random Forest Naive Bayes Logistic Regression Build Model withSelected Parameters Model Report Score Model withValidation Data Table Reader

Nodes

Extensions

Links