Icon

09. Advanced Machine Learning - Exercise

Advanced Data Mining - Exercise

Exercise 10 for KNIME User Training
- Training a Random Forest model to predict a nominal target column
- Evaluating the performance of a classification model
- Optimizing parameters of the Random Forest model
- Performing the classification multiple times in a cross validation loop

Activity I: Random Forest Model - Read CurrentDetailData.table data - Partition the data 50/50 using stratified sampling on the "Target" column - Train and apply a Random Forest model to predict the "Target" column - Use a tree depth of 5 and 50 models Activity II: Parameter Optimization - Add a parameter optimization loop to your model training process - Use Hillclimbing to determine the optimum number of models (min=10, max=200, step=10, int = yes) - Use maximum accuracy as the objective value - What is the best number of models?(Hint: don't forget to use the flow variable in the Random Forest Learner node)(Optional): Train a model with the best parameter set (Table Row to Variable, Random Forest Learner, and Model Writer nodes) Activity III: Cross Validation - Create a 10-fold cross validation for your model - Take a look at the error rates produced by the different iterations. Does the model seem stable? Activity IV (optional): Model Evaluation - Read CurrentDetailData.table data - Partition the data 50/50 using stratified sampling on the "Target" column - Train and apply a Random Forest model to predict the "Target" column - Train and apply a Decision Tree model to predict the "Target" column - Combine the performances of both models (Column Appender node) - Evaluate the performances of the models (Binary Classification Inspector node). Which model performs better? Activity I: Random Forest Model - Read CurrentDetailData.table data - Partition the data 50/50 using stratified sampling on the "Target" column - Train and apply a Random Forest model to predict the "Target" column - Use a tree depth of 5 and 50 models Activity II: Parameter Optimization - Add a parameter optimization loop to your model training process - Use Hillclimbing to determine the optimum number of models (min=10, max=200, step=10, int = yes) - Use maximum accuracy as the objective value - What is the best number of models?(Hint: don't forget to use the flow variable in the Random Forest Learner node)(Optional): Train a model with the best parameter set (Table Row to Variable, Random Forest Learner, and Model Writer nodes) Activity III: Cross Validation - Create a 10-fold cross validation for your model - Take a look at the error rates produced by the different iterations. Does the model seem stable? Activity IV (optional): Model Evaluation - Read CurrentDetailData.table data - Partition the data 50/50 using stratified sampling on the "Target" column - Train and apply a Random Forest model to predict the "Target" column - Train and apply a Decision Tree model to predict the "Target" column - Combine the performances of both models (Column Appender node) - Evaluate the performances of the models (Binary Classification Inspector node). Which model performs better?

Nodes

  • No nodes found

Extensions

  • No modules found

Links