Icon

09 Decision Tree Model

09 Decision Tree Model - Solution

Solution to an exercise for training a classification model.

Train and apply a decision tree model. Evaluate the model's performance with scoring metrics and an ROC curve.

CHECK YOUR ANSWERS:
a. The overall accuracy of the model is around 86%
b. The number of rows in the test dataset is 8138
c. The Area under the Curve statistics is around 0.88 for the decision tree model














Exercise: Decision Tree and Classification Model Evaluation1) Read the adult_joined.table file by executing the Table Reader node2) Partition the dataset into a training set (75%) and a test set (25%). Apply stratified sampling to the income column.3) Train a Decision Tree model on the training set to predict whether or not a person earns more than 50K per year4) Apply the model to the test set5) Evaluate the accuracy of the model with scoring metrics- What is the overall accuracy of the model?6) Open the configuration dialog of the Scorer (JavaScript) node and exclude those statistics from the class prediction statistics table that are also present in theconfusion matrix. Display the number of rows in the confusion matrix. - What is the number of rows in the test dataset?7) Evaluate the performance of the model with an ROC curve- What is the Area under the Curve statistics for the decision tree model?8) OPTIONAL: Try out other parameter settings to reach a better performance. For example, change the quality measure, pruning method, or minimum number ofrecords. The overall accuracy of the model is around 86 %The number of rows is 8138The Area under the Curve (AuC) statistics is around 0.88 Top: train set (75%)Bottom: test set (25%)Stratified sampling on incomeTrain the modelto predict incomeApply the modelto the test setscoring metricsRead adult_joined.tablePartitioning DecisionTree Learner Decision TreePredictor MISSING Scorer(JavaScript) ROC Curve Table Reader Exercise: Decision Tree and Classification Model Evaluation1) Read the adult_joined.table file by executing the Table Reader node2) Partition the dataset into a training set (75%) and a test set (25%). Apply stratified sampling to the income column.3) Train a Decision Tree model on the training set to predict whether or not a person earns more than 50K per year4) Apply the model to the test set5) Evaluate the accuracy of the model with scoring metrics- What is the overall accuracy of the model?6) Open the configuration dialog of the Scorer (JavaScript) node and exclude those statistics from the class prediction statistics table that are also present in theconfusion matrix. Display the number of rows in the confusion matrix. - What is the number of rows in the test dataset?7) Evaluate the performance of the model with an ROC curve- What is the Area under the Curve statistics for the decision tree model?8) OPTIONAL: Try out other parameter settings to reach a better performance. For example, change the quality measure, pruning method, or minimum number ofrecords. The overall accuracy of the model is around 86 %The number of rows is 8138The Area under the Curve (AuC) statistics is around 0.88 Top: train set (75%)Bottom: test set (25%)Stratified sampling on incomeTrain the modelto predict incomeApply the modelto the test setscoring metricsRead adult_joined.tablePartitioning DecisionTree Learner Decision TreePredictor MISSING Scorer(JavaScript) ROC Curve Table Reader

Nodes

Extensions

Links