Icon

Decision_​Tree_​exercise

Decision Tree - exercise

Introduction to Machine Learning Algorithms course - Session 1
Exercise 1
- Partition data into training and test set
- Train a decision rree model
- Apply the model to the test set
- Evaluate the model performance

URL: Description of the Ames Iowa Housing Data https://rdrr.io/cran/AmesHousing/man/ames_raw.html
URL: Ames Housing Dataset on kaggle https://www.kaggle.com/prevek18/ames-housing-dataset
URL: Decision Tree https://www.youtube.com/watch?v=CSwM92yTrJw
URL: Behind the scenes of Decision Tree https://www.youtube.com/watch?v=qB8HZpwqPEg
URL: Slides (Introduction to ML Algorithms course) https://www.knime.com/form/material-download-registration

Session 1 - Introduction & Decision Tree Algorithm

Exercise 01 Decision Tree

Learning objective: In this exercise, you'll learn how to train a binary classification model to predict whether the overall condition is high or low, using a node to evaluate the model's performance.


Workflow description: This workflow uses a dataset that describes the sale of individual residential properties in Ames, Iowa from 2006 to 2010. One of the columns is the overall condition ranking, with values between 1 and 10.
The goal of this exercise is to train a binary classification model, which can predict whether the overall condition is high or low. To do so, the workflow below reads the data set and creates the class column based on overall condition ranking, which is called rank and has the values low if the overall condition is smaller or equal to 5, otherwise high.


You'll find the instructions to the exercises in the yellow annotations.

Step 1. Partitioning

Utilize the Partitioning node to divide the data into training (70%) and test sets (30%). Specifically, employ stratified sampling based on the column rank to preserve the distribution of class values in both output tables.


Step 2. Decision Tree Learner

Train a Decision Tree model (using the Decision Tree Learner node) to predict the overall condition of a house as either high or low. Choose the rank column as the class column.


Step 3. Decision Tree Predictor

Utilize the trained model to predict the rank of houses in the test set using the Decision Tree Predictor node.


Step 4. Model evaluation

  1. Evaluate the accuracy of the decision tree model using the Scorer node. Select rank as the actual column and Prediction (rank) as the predicted column. Determine and report the accuracy of the model.

  2. Visualize the ROC curve using the ROC Curve node. Ensure that the checkbox "append columns with normalized class distribution" in the Decision Tree Predictor node is activated. Select rank as the Class column, set High as the Positive class value, and include only the P (rank=High) column.

  3. Optional: Try different setting options for the decision tree algorithm. Can you improve the model performance?


Data Preparation

Read AmesHousing.csv
CSV Reader
Table Partitioner
Decision Tree Learner
Extract Class Information
ROC Curve
Decision Tree Predictor
Scorer

Nodes

Extensions

Links