Icon

Challange 25

Challange 25
KNIME IT Challange 25Description: In this challenge series, the goal is to predict which customers of a certain telecom company are going to churn (that is, going to canceltheir contracts) based on attributes of their accounts. Here, the target class to be predicted is Churn (value 0 corresponds to customers that do notchurn, and 1 corresponds to those who do).After automatically picking a classification model for the task, you achieved an accuracy of about 95% for the test data, but the model does not performuniformly for both classes. In fact, it is better at predicting when a customer will not churn (Churn = 0) than when they will (Churn = 1). This imbalancecan be verified by looking at how precision and recall differ for these two classes, or by checking how metric Cohen’s kappa is a bit lower than 80%despite a very high accuracy. How can you preprocess and re-sample the training data in order to make the classification a bit more powerful for classChurn = 1? Note 1: Need more help to understand the problem? Check this blog post out. Note 2: This problem is hard: do not expect to see a majorperformance increase for class Churn = 1. Also, verifying if the performance increase is statistically significant will not be trivial. Still... give thischallenge your best try! Define parameters k_fold for parameters Build end model Test dataTrain dataNode 7Node 11Node 12Node 14Node 16Node 17Node 26Node 31Node 32Node 33Node 34Node 35Node 36 CSV Reader CSV Reader ParameterOptimization Loop End X-Partitioner X-Aggregator Gradient BoostedTrees Learner Gradient BoostedTrees Predictor SMOTE Gradient BoostedTrees Predictor Scorer Parameter OptimizationLoop Start Gradient BoostedTrees Learner Table Rowto Variable Scorer ROC Curve KNIME IT Challange 25Description: In this challenge series, the goal is to predict which customers of a certain telecom company are going to churn (that is, going to canceltheir contracts) based on attributes of their accounts. Here, the target class to be predicted is Churn (value 0 corresponds to customers that do notchurn, and 1 corresponds to those who do).After automatically picking a classification model for the task, you achieved an accuracy of about 95% for the test data, but the model does not performuniformly for both classes. In fact, it is better at predicting when a customer will not churn (Churn = 0) than when they will (Churn = 1). This imbalancecan be verified by looking at how precision and recall differ for these two classes, or by checking how metric Cohen’s kappa is a bit lower than 80%despite a very high accuracy. How can you preprocess and re-sample the training data in order to make the classification a bit more powerful for classChurn = 1? Note 1: Need more help to understand the problem? Check this blog post out. Note 2: This problem is hard: do not expect to see a majorperformance increase for class Churn = 1. Also, verifying if the performance increase is statistically significant will not be trivial. Still... give thischallenge your best try! Define parameters k_fold for parameters Build end model Test dataTrain dataNode 7Node 11Node 12Node 14Node 16Node 17Node 26Node 31Node 32Node 33Node 34Node 35Node 36 CSV Reader CSV Reader ParameterOptimization Loop End X-Partitioner X-Aggregator Gradient BoostedTrees Learner Gradient BoostedTrees Predictor SMOTE Gradient BoostedTrees Predictor Scorer Parameter OptimizationLoop Start Gradient BoostedTrees Learner Table Rowto Variable Scorer ROC Curve

Nodes

Extensions

Links