Icon

Building Churn Predictor

Training a Churn Predictor

This workflow is an example of how to train a basic machine learning model for a churn prediction task. In this case we train a random forest after oversampling the minority class with the SMOTE algorithm.

Note that the Learner-Predictor construct is common to all supervised algorithms. Here we also use a cross-validation procedure for a more reliable estimation of the random forest performance.

If you use this workflow, please cite:
F. Villaroel Ordenes & R. Silipo, “Machine learning for marketing on the KNIME Hub: The development of a live repository for marketing applications”, Journal of Business Research 137(1):393-410, DOI: 10.1016/j.jbusres.2021.08.036.

URL: Churn Prediction https://www.knime.org/knime-applications/churn-prediction

2. Data Manipulation/Preparation.

The process below is simplified. Most of the time, data manipulation/preparation involves the use of several nodes such as Missing Value, Row Filter, GroupBy (row aggregation), and Column Aggregator.

3. Standard ML process with cross-validation.

Train a model (learn) and test (predict) whether a customer will churn using any kind of nominal classifier. In this case we use Random Forest. The process includes a 5-fold cross-validation (80% training, 20% testing). At the end of the process, the model is written into a file so that it can be applied over unseen data.

4. Model EvaluaIion.

Evaluation with Scorer node and ROC curve. We use node "numeric scorer" for scale predictions.

Building a Churn Predictor

This workflow is an example of how to train a basic machine learning model for a churn prediction task. An example is provided with a small Kaggle dataset previously used in marketing research: https://www.kaggle.com/becksddf/churn-in-telecoms-dataset.

1. Read datasets.

Besides the nodes below, which read Excel and CSV files, KNIME offers a wide range of nodes to read different datastet types (e.g., parquet, json, images etc.).

5-fold validation. Stratified sampling. 1st output: Training 2nd output: Testing
X-Partitioner
AuC
ROC Curve
Inner Join 2 tables based on customer Phone
Joiner
Collect results after each of the 5 iterations
X-Aggregator
Writing current model
convert Churn column to String
Number to String
Random Forest Predictor
Random Forest Learner
Oversample churn class at each training sample
SMOTE
color by churn
Color Manager
Calls data
Excel Reader
Inspect variables. "Churn" column is unbalanced
Data Explorer
Accuracy, Precision Recall, F-measure
Scorer
Contract data
CSV Reader

Nodes

Extensions

Links