0 ×

01_​Model_​Selection_​Sampled

Workflow

Model Selection to predict Death Occurrences in Car Accidents
This workflow trains a few data analytics models and automatically selects the best one to predict death in car accidents. Data has been sub-sampled to allow the workflow execution also on the least equipped machines. Sub-sampling is in metanode Reading Data/Pre-processing and can be removed to make the workflow run on all data.
Reading Data - accident, - vehicle, - person - selected years Dataset Evaluation - 10-fold Cross-Validation - stddev and mean of error - if stddev/mean < 1.0 => GO! - else "dataset not valid!" Visual Investigationscatter plots (driver height vs. weight) and bar chart, statistics (notice: car model,latitude, car owner, and class), linear correlation: suspicious correlation betweenHISPANIC and INJ_SEV Error message in case dataset isnot general enough! Dimensionality Reduction - if % missing values > 90% => remove column - if variance < 0.005 => remove column - if a pair of columns are highly correlated =>remove one of the twoPCA not used because of interpretability loss Model SelectionSelect the best model in terms of AuC among:Random Forest, my own ensemble model (NaiveBayes, logit, dec.tree), decision tree from R, thecurrent model, and optionally ANN and k-Means The decision tree from R is a linked metanode. Display ResultsExport message (error orsuccess) and display onWebPortal Model Selection to Predict Death Occurrences in Car AccidentsThis workflow trains a few data analytics models and automatically selects the best one to predict death in car accidents.Data has been sub-sampled here to allow the workflow execution also on the least equipped machines. Sub-sampling is in metanode Reading Data/Pre-processing and can be removed tomake the workflow run on all data. check for too high correlation with class and INJ_SEVrm columnsclass as StringNode 393Node 394Node 398Node 401check qualityof dataset% missing valueslow variancehigh correlationNode 407Node 408Node 410Node 411reading tables accident, vehicle, person for selected yearsdriver height vs. weight Node 417Node 420basic general statsNode 422Node 423Node 427 Linear Correlation Pre-processing CASE SwitchData (Start) CASE SwitchData (End) Text Output Table Columnto Variable Dataset Evaluationthrough X-validation DimensionalityReduction Bag of Models Prepare ErrorMessage Clustering Prepare Message Reading Data JavaScriptScatter Plot Color Manager Remove Outliers Statistics JavaScriptBar Chart JavaScriptROC Curve Sampling Reading Data - accident, - vehicle, - person - selected years Dataset Evaluation - 10-fold Cross-Validation - stddev and mean of error - if stddev/mean < 1.0 => GO! - else "dataset not valid!" Visual Investigationscatter plots (driver height vs. weight) and bar chart, statistics (notice: car model,latitude, car owner, and class), linear correlation: suspicious correlation betweenHISPANIC and INJ_SEV Error message in case dataset isnot general enough! Dimensionality Reduction - if % missing values > 90% => remove column - if variance < 0.005 => remove column - if a pair of columns are highly correlated =>remove one of the twoPCA not used because of interpretability loss Model SelectionSelect the best model in terms of AuC among:Random Forest, my own ensemble model (NaiveBayes, logit, dec.tree), decision tree from R, thecurrent model, and optionally ANN and k-Means The decision tree from R is a linked metanode. Display ResultsExport message (error orsuccess) and display onWebPortal Model Selection to Predict Death Occurrences in Car AccidentsThis workflow trains a few data analytics models and automatically selects the best one to predict death in car accidents.Data has been sub-sampled here to allow the workflow execution also on the least equipped machines. Sub-sampling is in metanode Reading Data/Pre-processing and can be removed tomake the workflow run on all data. check for too high correlation with class and INJ_SEVrm columnsclass as StringNode 393Node 394Node 398Node 401check qualityof dataset% missing valueslow variancehigh correlationNode 407Node 408Node 410Node 411reading tables accident, vehicle, person for selected yearsdriver height vs. weight Node 417Node 420basic general statsNode 422Node 423Node 427 Linear Correlation Pre-processing CASE SwitchData (Start) CASE SwitchData (End) Text Output Table Columnto Variable Dataset Evaluationthrough X-validation DimensionalityReduction Bag of Models Prepare ErrorMessage Clustering Prepare Message Reading Data JavaScriptScatter Plot Color Manager Remove Outliers Statistics JavaScriptBar Chart JavaScriptROC Curve Sampling

Download

Get this workflow from the following link: Download

Resources

Nodes

01_​Model_​Selection_​Sampled consists of the following 168 nodes(s):

Plugins

01_​Model_​Selection_​Sampled contains nodes provided by the following 12 plugin(s):