Icon

05_​Model_​Optimization

Hyper Parameter Optimization - Exercise (Solution)

This workflow optimizes the parameters of a machine learning model that predicts the residual of time series (energy consumption). The residual of time series is what is left after removing the trend and first and second seasonality. The optimized parameters are the number of trees and tree depth in a Random Forest model.




Data Loading Data Preparation ACF Plot & seasonalityremoval Hyper Parameter Optimization Time Series Analysis05. Hyper Parameter OptimizationSummary:In this exercise we'll optimize some of the hyper parameters in our Random Forestmodel.Instructions:1) Run the workflow up through the Random Forest Predictor, we'll start from here2) Attach a Numeric Scorer to the output of the Predictor, verify the reference andprediction columns are correct in the configuration. Check the Attach output scores asflow variables option, we'll need these scores as flow variables later to select the bestparameters.3) Next we'll add the Parameter Optimization Loop Start node to our workflow. It's outputis a flow variable port. Attach this to the Random Forest Learner.4) To configure the Parameter Optimization Loop Start we'll add new variables to thetable in its configuration. These will represent the range of values we want to try whentraining.Create one with the name: NumTrees, with min value 5, max value 100, step size 5Create another with the name: TreeDepth with min value 1, max value 20, step size 1Check the box to indicate both are integers**Execute this node so you see your Flow Variables in the next step.5) Next configure the Random Forest Learner to use these flow variables. Open theconfiguration window for the Learner and go to the Flow Variables tab.In the drop down box next to maxLevels select your TreeDepth flow variable, and in thebox next to nrModels select NumTrees. This will instruct KNIME to control those modelparameters with your flow variables.6) Finally add the Parameter Optimization Loop End to the end of your workflow. Attachthe flow variable output of your Numeric Scorer node to it.In the configuration window for the Loop End node you can select which metric tooptimize for. We'll use Mean Absolute Percentage Error.Optional) Train a model with the optimized parameters from the loop convertdate/timeinto Date&Time objectssubstuting missing values with average ofprevious and next10 previous hoursIntroducemissinghoursEnergyusagedataPartition fromtop down fortime seriesdata String to Date&Time ImputingMissing Values Lag Column Column Filter Timestamp Alignment CSV Reader Partitioning Parameter OptimizationLoop Start ParameterOptimization Loop End Numeric Scorer Random Forest Learner(Regression) Random Forest Predictor(Regression) Random Forest Learner(Regression) Partitioning Random Forest Predictor(Regression) Numeric Scorer Table Rowto Variable Decompose Signal Data Loading Data Preparation ACF Plot & seasonalityremoval Hyper Parameter Optimization Time Series Analysis05. Hyper Parameter OptimizationSummary:In this exercise we'll optimize some of the hyper parameters in our Random Forestmodel.Instructions:1) Run the workflow up through the Random Forest Predictor, we'll start from here2) Attach a Numeric Scorer to the output of the Predictor, verify the reference andprediction columns are correct in the configuration. Check the Attach output scores asflow variables option, we'll need these scores as flow variables later to select the bestparameters.3) Next we'll add the Parameter Optimization Loop Start node to our workflow. It's outputis a flow variable port. Attach this to the Random Forest Learner.4) To configure the Parameter Optimization Loop Start we'll add new variables to thetable in its configuration. These will represent the range of values we want to try whentraining.Create one with the name: NumTrees, with min value 5, max value 100, step size 5Create another with the name: TreeDepth with min value 1, max value 20, step size 1Check the box to indicate both are integers**Execute this node so you see your Flow Variables in the next step.5) Next configure the Random Forest Learner to use these flow variables. Open theconfiguration window for the Learner and go to the Flow Variables tab.In the drop down box next to maxLevels select your TreeDepth flow variable, and in thebox next to nrModels select NumTrees. This will instruct KNIME to control those modelparameters with your flow variables.6) Finally add the Parameter Optimization Loop End to the end of your workflow. Attachthe flow variable output of your Numeric Scorer node to it.In the configuration window for the Loop End node you can select which metric tooptimize for. We'll use Mean Absolute Percentage Error.Optional) Train a model with the optimized parameters from the loop convertdate/timeinto Date&Time objectssubstuting missing values with average ofprevious and next10 previous hoursIntroducemissinghoursEnergyusagedataPartition fromtop down fortime seriesdata String to Date&Time ImputingMissing Values Lag Column Column Filter Timestamp Alignment CSV Reader Partitioning Parameter OptimizationLoop Start ParameterOptimization Loop End Numeric Scorer Random Forest Learner(Regression) Random Forest Predictor(Regression) Random Forest Learner(Regression) Partitioning Random Forest Predictor(Regression) Numeric Scorer Table Rowto Variable Decompose Signal

Nodes

Extensions

Links