Icon

parameter optimization demo

Advanced Data Mining - Solution

Solution to the exercise 10 for KNIME User Training
- Training a Random Forest model to predict a nominal target column
- Evaluating the performance of a classification model
- Optimizing parameters of the Random Forest model
- Performing the classification multiple times in a cross validation loop

Part I: Random Forest Model - Read CurrentDetailData.table data - Partition the data 70/30 using stratified sampling on the "Target" column - Train and apply a Random Forest model to predict the "Target" column - Use a tree depth of 5 and 50 models Part II: Parameter Optimization - Add a parameter optimization loop to your model training process - Use Brute Force (i.e., exectuting all settings) to determine the optimum number of models (min=10, max=200, step=10, int = yes) - Use maximum accuracy as the objective value - What is the best number of models?(Hint: don't forget to use the flow variable in the Random Forest Learner node) Part III: More Parameter Optimization (Optional) - In addition to the number of models, optimize the tree depth (min=4, max=10, step=1, int = yes) - Use maximum accuracy as the objective value - What is the best combination of number of models & tree depth?(Hint: don't forget to use the flow variable in the Random Forest Learner node) Define ParametersCollect AccuracyDefine ParametersCollect Accuracy Partitioning Parameter OptimizationLoop Start ParameterOptimization Loop End Partitioning Random ForestLearner Random ForestPredictor Random ForestLearner Random ForestPredictor Scorer Scorer Table Reader Parameter OptimizationLoop Start ParameterOptimization Loop End Random ForestLearner Random ForestPredictor Partitioning Scorer Part I: Random Forest Model - Read CurrentDetailData.table data - Partition the data 70/30 using stratified sampling on the "Target" column - Train and apply a Random Forest model to predict the "Target" column - Use a tree depth of 5 and 50 models Part II: Parameter Optimization - Add a parameter optimization loop to your model training process - Use Brute Force (i.e., exectuting all settings) to determine the optimum number of models (min=10, max=200, step=10, int = yes) - Use maximum accuracy as the objective value - What is the best number of models?(Hint: don't forget to use the flow variable in the Random Forest Learner node) Part III: More Parameter Optimization (Optional) - In addition to the number of models, optimize the tree depth (min=4, max=10, step=1, int = yes) - Use maximum accuracy as the objective value - What is the best combination of number of models & tree depth?(Hint: don't forget to use the flow variable in the Random Forest Learner node) Define ParametersCollect AccuracyDefine ParametersCollect Accuracy Partitioning Parameter OptimizationLoop Start ParameterOptimization Loop End Partitioning Random ForestLearner Random ForestPredictor Random ForestLearner Random ForestPredictor Scorer Scorer Table Reader Parameter OptimizationLoop Start ParameterOptimization Loop End Random ForestLearner Random ForestPredictor Partitioning Scorer

Nodes

Extensions

Links