Icon

final team project 2

train
CSV Reader
test
CSV Reader
ROC Curve
Weight, payer code, max glucose, a1c res and medical specialty contain too many missing values. Diag_1, Diag_2 and Diag_3 have too many different values which could lead to hundred of small dummy variables which would eventually lead to overfitting.
Column Filter
Column Filter
Changes ? values in race to Unknown to make results more readable.
Rule Engine
Changes ? values in race to Unknown to make results more readable.
Rule Engine
CSV Writer
test predictor
Random Forest Predictor
Column Filter
CSV Writer
Test predictor.
Logistic Regression Predictor
Column Filter
Scorer
Scorer
CSV Writer
CSV Writer
Table View
Changes target to nominal variable to allow for logistic regression training. Also changes admission source and type id to a nominal variable as order should not matter.
Number to String
Table View
Normalizes large columns like num lab procedures and num medications to prevent bias in logistic regression as large values can lead to bias.
Normalizer
Ignored readmitted and encounter id columns when training to prevent bias
Logistic Regression Learner
Normalizer
Table Partitioner
Training predictor
Logistic Regression Predictor
Random Forest Learner
training predictor
Random Forest Predictor
Changes admission source and type to nominal variable
Number to String
ROC Curve

Nodes

Extensions

Links