Icon

Final Term Project Workflow (6)

Read Clean Data Set
CSV Reader
Master merged file with rule engines applied
CSV Reader
One-hot encode 22 categoricalcolumns into binary 0/1 dummies.
One to Many
Start 10-fold stratified CV loop.
X-Partitioner
Balance training fold via EqualSize Sampling
Equal Size Sampling
Z-score normalize the 5 truenumeric columns (VE_FORMS,BODY_TYP, GVWR_FROM, AGE,VSPD_LIM).
Normalizer
Apply training-fold normalizationparameters to test fold.
Normalizer (Apply)
Train Logistic Regression onbalanced, normalized training fold.
Logistic Regression Learner
X-Aggregator
10-foldcross val.
X-Partitioner
Scorer
Exact sampling
Equal Size Sampling
Excel Writer
Set positive class to "high"
ROC Curve
ROC Curve
Suffix created
MultiLayerPerceptron Predictor
kept 100 iterations
RProp MLP Learner
RProp MLP Learner
Distinguished Suffx
MultiLayerPerceptron Predictor
Very similar AUC with above ROC
ROC Curve
Transform text columns to numerical except target
One to Many
Scorer
Binary color coding
Color Manager
Data Explorer
kept parameters as above
Decision Tree Learner
Concatenate
Exact sampling
Equal Size Sampling
ROC Curve
fixed random seed70/30 Train-Test
Table Partitioner
Close 10-fold CV loop. Outputscombined predictions across all37,063 rows
X-Aggregator
Set prediction column with suffix
Decision Tree Predictor
Created suffix to distinguish CV model
Decision Tree Predictor
AUC similar as above
ROC Curve
Column Appender
Predict INJ_SEV_BIN on thenormalized test fold using thetrained LR model
Logistic Regression Predictor
X-Aggregator
Per-fold confusion matrix andaccuracy/Kappa metrics
Scorer
ROC Curve
Set tree depth and split parameters/adjusted pmml settings
Decision Tree Learner
Exact/static seed
Equal Size Sampling
Scorer
min/max scaling
Normalizer
Acc./Kappa values
Scorer
Exact sampling/seed
Equal Size Sampling
70/30Train-Test
Table Partitioner
Checking model reliability&Stability with k-fold
X-Partitioner

Nodes

Extensions

Links