Icon

00_​Decision_​Tree

Decision Tree

Introduction to Machine Learning Algorithms course - Session 1
Exercise 1
- Partition data into training and test set
- Train a decision rree model
- Apply the model to the test set
- Evaluate the model performance

Exercise: Decision Tree1) Use a Partitioning node to split data into training (70%) e test set (30%)- use stratified sampling based on the column rank, to retain the distribution of the class values in both output tabes.2) Train a Decision Tree model to predict the overall condition of a house (high/low) (Decision Tree Learner node)- Select the "rank" column as the class column2) Use the trained model to predict the rank of the houses in the test set (Decision Tree Predictor node)3) Evaluate the accuracy of the decision tree model (Scorer (Java Script) node)- Select "rank" as the actual column and "Prediction (rank)" as the predicted column- What is the accuracy of the model?4) Visualize the ROC curve (ROC Curve node)- Make sure that checkbox "append columns with normalized class distribution" in the Decision Tree Predictor node is activated- Select "rank" as Class column and "High" as Positive class value. Include only the "P (rank=High)" column5) Optional: Try different setting options for the decision tree algorithm. Can you improve the model performance? Use Case DescriptionThe dataset we use in this exercise describes the sale of individual residential properties in Ames, Iowa from 2006 to 2010.One of the columns is the overall condition ranking, with values between 1 and 10. The goal of this exercise is to train a binary classification model, which can predict whether the overall condition is high orlow. To do so, the workflow below reads the data set and creates the class column based on overall condition ranking,which is called rank and has the values low if the overall condition is smaller or equal to 5, otherwise high. It is now on you continue this workflow! Read AmesHousing.csv Extract ClassInformation File Reader Exercise: Decision Tree1) Use a Partitioning node to split data into training (70%) e test set (30%)- use stratified sampling based on the column rank, to retain the distribution of the class values in both output tabes.2) Train a Decision Tree model to predict the overall condition of a house (high/low) (Decision Tree Learner node)- Select the "rank" column as the class column2) Use the trained model to predict the rank of the houses in the test set (Decision Tree Predictor node)3) Evaluate the accuracy of the decision tree model (Scorer (Java Script) node)- Select "rank" as the actual column and "Prediction (rank)" as the predicted column- What is the accuracy of the model?4) Visualize the ROC curve (ROC Curve node)- Make sure that checkbox "append columns with normalized class distribution" in the Decision Tree Predictor node is activated- Select "rank" as Class column and "High" as Positive class value. Include only the "P (rank=High)" column5) Optional: Try different setting options for the decision tree algorithm. Can you improve the model performance? Use Case DescriptionThe dataset we use in this exercise describes the sale of individual residential properties in Ames, Iowa from 2006 to 2010.One of the columns is the overall condition ranking, with values between 1 and 10. The goal of this exercise is to train a binary classification model, which can predict whether the overall condition is high orlow. To do so, the workflow below reads the data set and creates the class column based on overall condition ranking,which is called rank and has the values low if the overall condition is smaller or equal to 5, otherwise high. It is now on you continue this workflow! Read AmesHousing.csv Extract ClassInformation File Reader

Nodes

Extensions

Links