Icon

justknimeit-25

justknimeit-25
Preparing the dataSet Area Code & Churn to String via TransformationTab Model Training Model Evaluation Challenge 25: Modeling Churn Predictions - Part 3Level: HardDescription: In this challenge series, the goal is to predict which customers of a certain telecom company are going to churn (that is, going to cancel their contracts) based on attributes of theiraccounts. Here, the target class to be predicted is Churn (value 0 corresponds to customers that do not churn, and 1 corresponds to those who do).After automatically picking a classification model for the task, you achieved an accuracy of about 95% for the test data, but the model does not perform uniformly for both classes. In fact, it is betterat predicting when a customer will not churn (Churn = 0) than when they will (Churn = 1). This imbalance can be verified by looking at how precision and recall differ for these two classes, or bychecking how metric Cohen’s kappa is a bit lower than 80% despite a very high accuracy. How can you preprocess and re-sample the training data in order to make the classification a bit morepowerful for class Churn = 1? Note 1: Need more help to understand the problem? Check this blog post out. Note 2: This problem is hard: do not expect to see a major performance increase forclass Churn = 1. Also, verifying if the performance increase is statistically significant will not be trivial. Still... give this challenge your best try! Read test dataRead training dataMatch original vs. predicted Churn valuesOversample churn class at each training samplepredictchurnlearn and predictchurn20-fold validation.Stratified sampling.1st output: Training2nd output: TestingCollect resultsafter each of the 20 iterationsCombineTables CSV Reader CSV Reader Scorer (JavaScript) SMOTE Workflow Executor AutoML X-Partitioner X-Aggregator Concatenate Preparing the dataSet Area Code & Churn to String via TransformationTab Model Training Model Evaluation Challenge 25: Modeling Churn Predictions - Part 3Level: HardDescription: In this challenge series, the goal is to predict which customers of a certain telecom company are going to churn (that is, going to cancel their contracts) based on attributes of theiraccounts. Here, the target class to be predicted is Churn (value 0 corresponds to customers that do not churn, and 1 corresponds to those who do).After automatically picking a classification model for the task, you achieved an accuracy of about 95% for the test data, but the model does not perform uniformly for both classes. In fact, it is betterat predicting when a customer will not churn (Churn = 0) than when they will (Churn = 1). This imbalance can be verified by looking at how precision and recall differ for these two classes, or bychecking how metric Cohen’s kappa is a bit lower than 80% despite a very high accuracy. How can you preprocess and re-sample the training data in order to make the classification a bit morepowerful for class Churn = 1? Note 1: Need more help to understand the problem? Check this blog post out. Note 2: This problem is hard: do not expect to see a major performance increase forclass Churn = 1. Also, verifying if the performance increase is statistically significant will not be trivial. Still... give this challenge your best try! Read test dataRead training dataMatch original vs. predicted Churn valuesOversample churn class at each training samplepredictchurnlearn and predictchurn20-fold validation.Stratified sampling.1st output: Training2nd output: TestingCollect resultsafter each of the 20 iterationsCombineTables CSV Reader CSV Reader Scorer (JavaScript) SMOTE Workflow Executor AutoML X-Partitioner X-Aggregator Concatenate

Nodes

Extensions

Links