Icon

05_​Model_​Optimization

Hyper Parameter Optimization - Exercise (Solution)

This workflow optimizes the parameters of a machine learning model that predicts the residual of time series (energy consumption). The residual of time series is what is left after removing the trend and first and second seasonality. The optimized parameters are the number of trees and tree depth in a Random Forest model.

URL: Parameter Optimization for Prediction Loops https://youtu.be/IlqepyIba6Y
URL: Slides on the KNIME Website https://www.knime.com/form/material-download-registration

Data Loading
Data Preparation
ACF Plot & seasonality removal
Hyper Parameter Optimization
Time Series Analysis 05. Hyper Parameter OptimizationSummary: In this exercise we'll optimize some of the hyper parameters in our Random Forest model. Instructions:1) Run the workflow up through the Random Forest Predictor, we'll start from here 2) Attach a Numeric Scorer to the output of the Predictor, verify the reference and prediction columns are correct in the configuration. Check the Attach output scores as flow variables option, we'll need these scores as flow variables later to select the best parameters. 3) Next we'll add the Parameter Optimization Loop Start node to our workflow. It's output is a flow variable port. Attach this to the Random Forest Learner. 4) To configure the Parameter Optimization Loop Start we'll add new variables to the table in its configuration. These will represent the range of values we want to try when training. Create one with the name: NumTrees, with min value 5, max value 100, step size 5 Create another with the name: TreeDepth with min value 1, max value 20, step size 1 Check the box to indicate both are integers **Execute this node so you see your Flow Variables in the next step. 5) Next configure the Random Forest Learner to use these flow variables. Open the configuration window for the Learner and go to the Flow Variables tab. In the drop down box next to maxLevels select your TreeDepth flow variable, and in the box next to nrModels select NumTrees. This will instruct KNIME to control those model parameters with your flow variables. 6) Finally add the Parameter Optimization Loop End to the end of your workflow. Attach the flow variable output of your Numeric Scorer node to it. In the configuration window for the Loop End node you can select which metric to optimize for. We'll use Mean Absolute Percentage Error. Optional) Train a model with the optimized parameters from the loop
Date&Time Aligner (Labs)
Autocorrelation Plot (Labs)
Energy usage data
CSV Reader
convertdate/timeinto Date&Time objects
String to Date&Time
Numeric Scorer
Random Forest Predictor (Regression)
Parameter Optimization Loop End
Table Partitioner
Numeric Scorer
substuting missing values with average of previous and next
Imputing Missing Values
Random Forest Learner (Regression)
Partition from top down for time series data
Table Partitioner
Random Forest Predictor (Regression)
Column Filter
10 previous hours
Lag Column
Random Forest Learner (Regression)
Parameter Optimization Loop Start
Table Row to Variable

Nodes

Extensions

Links