Icon

05_​Model_​Optimization

Hyper Parameter Optimization - Exercise (Solution)

This workflow optimizes the parameters of a machine learning model that predicts the residual of time series (energy consumption). The residual of time series is what is left after removing the trend and first and second seasonality. The optimized parameters are the number of trees and tree depth in a Random Forest model.

URL: Parameter Optimization for Prediction Loops https://youtu.be/IlqepyIba6Y
URL: Slides on the KNIME Website https://www.knime.com/form/material-download-registration

Data Loading
Data Preparation

ACF Plot

Hyper Parameter Optimization

Time Series Analysis
05. Hyper Parameter Optimization

Summary:
In this exercise you will optimize some of the hyper parameters in our random forest model.

Instructions:
1) Run the workflow up through the Random Forest Predictor. You will ll start from here

2) Attach a Numeric Scorer to the output of the Predictor, verify the reference and prediction columns are correct in the configuration. Output scores as flow variables. You will need these scores as flow variables later to select the best parameters.

3) Next you will add the Parameter Optimization Loop Start node to your workflow. Its output is a flow variable port. Attach this to the Random Forest Learner.

4) To configure the Parameter Optimization Loop Start you will add new variables to the table in its configuration. These will represent the range of values we want to try when training.
Create one with the name: NumTrees, with min value 5, max value 100, step size 5
Create another with the name: TreeDepth with min value 1, max value 20, step size 1
Check the box to indicate both are integers
**Execute this node so you see your Flow Variables in the next step.

5) Next configure the Random Forest Learner to use these flow variables. Open the configuration window for the Learner and go to the Flow Variables tab.
In the drop down box next to maxLevels select your TreeDepth flow variable, and in the box next to nrModels select NumTrees. This will instruct KNIME to control those model parameters with your flow variables.

6) Finally add the Parameter Optimization Loop End and attach the flow variable output from your Numeric Scorer node.
In the configuration window for the Loop End node you can select which metric to optimize for. You will minimize Mean Absolute Percentage Error.

Optional: Train a model with the optimized parameters from the loop

7) Convert the winning combination of parameters into flow variable with a Table Row to Variable node. The winning parameter combination is available at the top output port of the Parameter Optimization Loop End node.

8) Train a random forest model with the winning parameters from step 7. Set the parameters with the appropriate variable names. (Hint: see step 5). Use the validation data (top output port of the 2nd Table Partitioner node) to train this final model.

9) Apply the trained model to the test partition (bottom output port of the 2nd Table Partitioner node) in a Random Forest Predictor (Regression) node.

10) Evaluate the model performance with a Numeric Scorer node.

Date&Time Aligner (Labs)
Autocorrelation Plot (Labs)
Energy usage data
CSV Reader
Missing Value
convertdate/timeinto Date&Time objects
String to Date&Time
Further partitioning of data50%: validation50% test
Table Partitioner
Random Forest Learner (Regression)
Partition fromtop down fortime seriesdata50%: traning50%: rest
Table Partitioner
Random Forest Predictor (Regression)
Column Filter
10 previous hours
Lag Column

Nodes

Extensions

Links