Icon

4.1 - Random Forest exercise

<p>03.01 Random Forest - exercise<br><br>[L4-ML] Machine Learning Algorithms - Specialization</p><p>03 Ensemble Models<br>- Train a Random Forest model<br>- Apply the model to the test set<br>- Evaluate the model performance with the Scorer node<br>- Perform parameter optimization</p>

URL: Description of the Ames Iowa Housing Data https://rdrr.io/cran/AmesHousing/man/ames_raw.html
URL: Ames Housing Dataset on kaggle https://www.kaggle.com/prevek18/ames-housing-dataset

03 - Ensemble Models

03.01 Random Forest

Learning objective: In this exercise you'll learn how predict the price of a house in Ames (Iowa, USA) given a number of features: size, neighborhood, heating...


Workflow description: This workflow uses a dataset that describes the sale of individual residential properties in Ames, Iowa from 2006 to 2010. One of the columns is the overall condition ranking, with values between 1 and 10.


You'll find the instructions to the exercises in the yellow annotations.

Step 1. Random Forest Learner

Train a Random Forest model to predict the overall condition of a house (high/low) (Random Forest Learner node)

  • Select the rank column as the target column

  • Leave other settings to their defaults


Step 2. Random Forest Predictor

Use the trained model to predict the rank of the houses in the test set (Random Forest Predictor node)


Data Preparation

Step 3. Model evaluation

Evaluate the accuracy of the random forest model (Scorer node)

  • Select rank as the actual column and Prediction (rank) as the predicted column

  • What is the accuracy of the model?


Step 4. Parameter Optimization Loop Start (Optional)

Use the Parameter Optimization Loop Start node to define the possible values for the tree depth and the number of models

  • Connect the variable port to the Random Forest Learner node

  • Use the created flow variables to overwrite the according setting option in the Random Forest Learner node.


Step 5. Parameter Optimization Loop End (Optional)

Use the Parameter Optimization Loop End node to define the accuracy as the objective function

  • Which settings lead to the model with highest accuracy?


Apply a Random Forest to AI Resume Screening Data on 3.1 and compare performance.

out of bag
Random Forest Predictor
Parameter Optimization Loop Start
Scorer
Scorer
Read AmesHousing.csv
CSV Reader
Parameter Optimization Loop End
Random Forest Predictor
Preprocessing
Random Forest Learner

Nodes

Extensions

Links