Icon

03_​Random_​Forest_​exercise

Random Forest - exercise

Introduction to Machine Learning Algorithms course - Session 2
Exercise 3
- Train a Random Forest model
- Apply the model to the test set
- Evaluate the model performance with the Scorer node
- Perform parameter optimization

URL: Description of the Ames Iowa Housing Data https://rdrr.io/cran/AmesHousing/man/ames_raw.html
URL: Ames Housing Dataset on kaggle https://www.kaggle.com/prevek18/ames-housing-dataset
URL: Random Forest https://www.youtube.com/watch?v=X4H7w6LDgYM
URL: Slides (Introduction to ML Algorithms course) https://www.knime.com/form/material-download-registration

Session 2 - Regression Models, Ensemble Models, & Logistic Regression

Exercise 03 Random Forest

Learning objective: In this exercise you'll learn how predict the price of a house in Ames (Iowa, USA) given a number of features: size, neighborhood, heating...


Workflow description: This workflow uses a dataset that describes the sale of individual residential properties in Ames, Iowa from 2006 to 2010. One of the columns is the overall condition ranking, with values between 1 and 10.


You'll find the instructions to the exercises in the yellow annotations.

Step 1. Random Forest Learner

Train a Random Forest model to predict the overall condition of a house (high/low) (Random Forest Learner node)

  • Select the rank column as the target column

  • Leave other settings to their defaults


Step 2. Random Forest Predictor

Use the trained model to predict the rank of the houses in the test set (Random Forest Predictor node)


Data Preparation

Step 3. Model evaluation

Evaluate the accuracy of the random forest model (Scorer node)

  • Select rank as the actual column and Prediction (rank) as the predicted column

  • What is the accuracy of the model?


Step 4. Parameter Optimization Loop Start (Optional)

Use the Parameter Optimization Loop Start node to define the possible values for the tree depth and the number of models

  • Connect the variable port to the Random Forest Learner node

  • Use the created flow variables to overwrite the according setting option in the Random Forest Learner node.


Step 5. Parameter Optimization Loop End (Optional)

Use the Parameter Optimization Loop End node to define the accuracy as the objective function

  • Which settings lead to the model with highest accuracy?


Scorer
Random Forest Learner
Random Forest Predictor
Read AmesHousing.csv
CSV Reader
Preprocessing

Nodes

Extensions

Links