Icon

09. Advanced Data Mining - solution

Advanced Data Mining - Solution

Solution to the exercise 10 for KNIME User Training
- Training a Random Forest model to predict a nominal target column
- Evaluating the performance of a classification model
- Optimizing parameters of the Random Forest model
- Performing the classification multiple times in a cross validation loop

Activity I: Random Forest Model - Read CurrentDetailData.table data - Partition the data 50/50 using stratified sampling on the "Target" column - Train and apply a Random Forest model to predict the "Target" column - Use a tree depth of 5 and 50 models Activity II: Parameter Optimization - Add a parameter optimization loop to your model training process - Use Hillclimbing to determine the optimum number of models (min=10, max=200, step=10, int = yes) - Use maximum accuracy as the objective value - What is the best number of models?(Hint: don't forget to use the flow variable in the Random Forest Learner node)(Optional): Train a model with the best parameter set (Table Row to Variable, Random Forest Learner, and Model Writer nodes) Activity III: Cross Validation - Create a 10-fold cross validation for your model - Take a look at the error rates produced by the different iterations. Does the model seem stable? Activity IV (optional): Model Evaluation - Read CurrentDetailData.table data - Partition the data 50/50 using stratified sampling on the "Target" column - Train and apply a Random Forest model to predict the "Target" column - Train and apply a Decision Tree model to predict the "Target" column - Combine the performances of both models (Column Appender node) - Evaluate the performances of the models (Binary Classification Inspector node). Which model performs better? Define ParametersCollect AccuracyCollect optimumnr of modelsWrite model X-Partitioner Partitioning Parameter OptimizationLoop Start ParameterOptimization Loop End X-Aggregator Partitioning Random ForestLearner Random ForestPredictor Random ForestLearner Random ForestPredictor Random ForestLearner Random ForestPredictor Scorer Scorer Table Rowto Variable Random ForestLearner Model Writer Partitioning Random ForestLearner Random ForestPredictor Binary ClassificationInspector Table Reader Column Appender DecisionTree Learner Decision TreePredictor Table Reader Activity I: Random Forest Model - Read CurrentDetailData.table data - Partition the data 50/50 using stratified sampling on the "Target" column - Train and apply a Random Forest model to predict the "Target" column - Use a tree depth of 5 and 50 models Activity II: Parameter Optimization - Add a parameter optimization loop to your model training process - Use Hillclimbing to determine the optimum number of models (min=10, max=200, step=10, int = yes) - Use maximum accuracy as the objective value - What is the best number of models?(Hint: don't forget to use the flow variable in the Random Forest Learner node)(Optional): Train a model with the best parameter set (Table Row to Variable, Random Forest Learner, and Model Writer nodes) Activity III: Cross Validation - Create a 10-fold cross validation for your model - Take a look at the error rates produced by the different iterations. Does the model seem stable? Activity IV (optional): Model Evaluation - Read CurrentDetailData.table data - Partition the data 50/50 using stratified sampling on the "Target" column - Train and apply a Random Forest model to predict the "Target" column - Train and apply a Decision Tree model to predict the "Target" column - Combine the performances of both models (Column Appender node) - Evaluate the performances of the models (Binary Classification Inspector node). Which model performs better? Define ParametersCollect AccuracyCollect optimumnr of modelsWrite model X-Partitioner Partitioning Parameter OptimizationLoop Start ParameterOptimization Loop End X-Aggregator Partitioning Random ForestLearner Random ForestPredictor Random ForestLearner Random ForestPredictor Random ForestLearner Random ForestPredictor Scorer Scorer Table Rowto Variable Random ForestLearner Model Writer Partitioning Random ForestLearner Random ForestPredictor Binary ClassificationInspector Table Reader Column Appender DecisionTree Learner Decision TreePredictor Table Reader

Nodes

Extensions

Links