Icon

KNIME_​challenge25_​solution

KNIME_challenge25_solution
Parameter Optimization (Table) Component on a Generic Model with Range Sliders This component can optimize a generic classification model and a generic set of numerical parameters. In this case the component performs a parameter optimization on a gradient boosted to optimize number of trees ,max tree depth and learning rate. Check the workflow description to find a step-by-step guide to adapt the workflow to your own classification model. Partition of dataset meant for trainingmodel Partition of dataset for testing modelaccuracy Define ParametersParameters to be optimized should be the same namingand number in those two nodes, names must start withGBT_ . Captures the Model training process Using the best set of parameters for the model Challenge 25: Modeling Churn Predictions - Part 3Level: HardDescription: In this challenge series, the goal is to predict which customers of a certain telecom company are going to churn (that is, going to cancel their contracts) based on attributes of their accounts. Here, the target class to be predicted is Churn(value 0 corresponds to customers that do not churn, and 1 corresponds to those who do).After automatically picking a classification model for the task, you achieved an accuracy of about 95% for the test data, but the model does not perform uniformly for both classes. In fact, it is better at predicting when a customer will not churn (Churn = 0)than when they will (Churn = 1). This imbalance can be verified by looking at how precision and recall differ for these two classes, or by checking how metric Cohen’s kappa is a bit lower than 80% despite a very high accuracy. How can you preprocessand re-sample the training data in order to make the classification a bit more powerful for class Churn = 1? Note 1: Need more help to understand the problem? Check this blog post out. Note 2: This problem is hard: do not expect to see a majorperformance increase for class Churn = 1. Also, verifying if the performance increase is statistically significant will not be trivial. Still... give this challenge your best try! Author: Aline Bessa Readchurn_problem_training_data.csvReadchurn_problem_test_data.csvProvide initial paramerterSelect range of parameters (Click "Close & Apply" in the view after selection)Node 736Node 737ParameterOptimization (Table) CSV Reader CSV Reader CaptureWorkflow End Variable Creator CaptureWorkflow Start Gradient BoostedTrees Learner Gradient BoostedTrees Predictor Gradient BoostedTrees Learner Gradient BoostedTrees Predictor Scorer Parameter Ranges Table Creator Scorer (JavaScript) Parameter Optimization (Table) Component on a Generic Model with Range Sliders This component can optimize a generic classification model and a generic set of numerical parameters. In this case the component performs a parameter optimization on a gradient boosted to optimize number of trees ,max tree depth and learning rate. Check the workflow description to find a step-by-step guide to adapt the workflow to your own classification model. Partition of dataset meant for trainingmodel Partition of dataset for testing modelaccuracy Define ParametersParameters to be optimized should be the same namingand number in those two nodes, names must start withGBT_ . Captures the Model training process Using the best set of parameters for the model Challenge 25: Modeling Churn Predictions - Part 3Level: HardDescription: In this challenge series, the goal is to predict which customers of a certain telecom company are going to churn (that is, going to cancel their contracts) based on attributes of their accounts. Here, the target class to be predicted is Churn(value 0 corresponds to customers that do not churn, and 1 corresponds to those who do).After automatically picking a classification model for the task, you achieved an accuracy of about 95% for the test data, but the model does not perform uniformly for both classes. In fact, it is better at predicting when a customer will not churn (Churn = 0)than when they will (Churn = 1). This imbalance can be verified by looking at how precision and recall differ for these two classes, or by checking how metric Cohen’s kappa is a bit lower than 80% despite a very high accuracy. How can you preprocessand re-sample the training data in order to make the classification a bit more powerful for class Churn = 1? Note 1: Need more help to understand the problem? Check this blog post out. Note 2: This problem is hard: do not expect to see a majorperformance increase for class Churn = 1. Also, verifying if the performance increase is statistically significant will not be trivial. Still... give this challenge your best try! Author: Aline Bessa Readchurn_problem_training_data.csvReadchurn_problem_test_data.csvProvide initial paramerterSelect range of parameters (Click "Close & Apply" in the view after selection)Node 736Node 737ParameterOptimization (Table) CSV Reader CSV Reader CaptureWorkflow End Variable Creator CaptureWorkflow Start Gradient BoostedTrees Learner Gradient BoostedTrees Predictor Gradient BoostedTrees Learner Gradient BoostedTrees Predictor Scorer Parameter Ranges Table Creator Scorer (JavaScript)

Nodes

Extensions

Links