Icon

01_​Creating_​a_​Churn_​Predictor

Training a Churn Predictor

This workflow is an example of how to train a basic machine learning model for a churn prediction task. In this case we train a random forest after oversampling the minority class with the SMOTE algorithm.

Note that the Learner-Predictor construct is common to all supervised algorithms. Here we also use a cross-validation procedure for a more reliable estimation of the random forest performance.

If you use this workflow, please cite:
F. Villaroel Ordenes & R. Silipo, “Machine learning for marketing on the KNIME Hub: The development of a live repository for marketing applications”, Journal of Business Research 137(1):393-410, DOI: 10.1016/j.jbusres.2021.08.036.

2. Data Manipulation/Preparation. The process below is simplified. Mostof the time, data manipulation/preparation involves the use of several nodes such as "missing value", "row filter", "groupby" (row aggregation), and"column aggregator". 3. Standard ML process with cross-validation. Train a model (learn) and test (predict) whether a customer will churnusing any kind of nominal classifier. In this case we use Random Forest. The process includes a 5-fold cross-validation(80% training, 20% testing). At the end of the process, the model is written into a file so that it can be applied overunseen data. 4. Model EvaluaIion.Evaluation with Scorer nodeand ROC curve. We usenode "numeric scorer" forscale predictions. Creating a Customer Churn Predictor This workflow is an example of how to train a basic machine learning model for a churn prediction task. An example is provided with a small Kaggle dataset previously used in marketing research: https://www.kaggle.com/becksddf/churn-in-telecoms-dataset. 1. Read datasets.Besides the nodesbelow, which read Exceland CSV files, KNIMEoffers a wide range ofnodes to read differentdatastet types (e.g.,parquet, json, images etc.). colorby churnconvert Churncolumn to StringAuCCalls dataContract dataAccuracy, PrecisionRecall, F-measure5-fold validation.Stratified sampling.1st output: Training2nd output: TestingCollect resultsafter each of the 5 iterationsOversample churn class at each training sampleInspect variables."Churn" column isunbalancedInner Join2 tables basedon customer PhoneColor Manager Number To String ROC Curve Excel Reader CSV Reader Scorer X-Partitioner X-Aggregator Random ForestLearner Random ForestPredictor SMOTE Data Explorer Writingcurrent model Joiner 2. Data Manipulation/Preparation. The process below is simplified. Mostof the time, data manipulation/preparation involves the use of several nodes such as "missing value", "row filter", "groupby" (row aggregation), and"column aggregator". 3. Standard ML process with cross-validation. Train a model (learn) and test (predict) whether a customer will churnusing any kind of nominal classifier. In this case we use Random Forest. The process includes a 5-fold cross-validation(80% training, 20% testing). At the end of the process, the model is written into a file so that it can be applied overunseen data. 4. Model EvaluaIion.Evaluation with Scorer nodeand ROC curve. We usenode "numeric scorer" forscale predictions. Creating a Customer Churn Predictor This workflow is an example of how to train a basic machine learning model for a churn prediction task. An example is provided with a small Kaggle dataset previously used in marketing research: https://www.kaggle.com/becksddf/churn-in-telecoms-dataset. 1. Read datasets.Besides the nodesbelow, which read Exceland CSV files, KNIMEoffers a wide range ofnodes to read differentdatastet types (e.g.,parquet, json, images etc.). colorby churnconvert Churncolumn to StringAuCCalls dataContract dataAccuracy, PrecisionRecall, F-measure5-fold validation.Stratified sampling.1st output: Training2nd output: TestingCollect resultsafter each of the 5 iterationsOversample churn class at each training sampleInspect variables."Churn" column isunbalancedInner Join2 tables basedon customer PhoneColor Manager Number To String ROC Curve Excel Reader CSV Reader Scorer X-Partitioner X-Aggregator Random ForestLearner Random ForestPredictor SMOTE Data Explorer Writingcurrent model Joiner

Nodes

Extensions

Further Links