Icon

Churn Prediction

Customer churn is a critical challenge in the telecommunications industry, as retaining existing subscribers is far more cost-effective than acquiring new ones. This dataset captures customer-level information that can be used to analyze churn behavior. It includes demographic details, subscription and service usage patterns, billing information, and an indicator of whether the customer has churned.

Dataset Overview

  • Rows: 7,043

  • Columns: 21

Key Variables

  • CustomerID: Unique customer identifier

  • Gender: Male or Female

  • SeniorCitizen: 0 = No, 1 = Yes

  • Partner / Dependents: Whether the customer has a partner or dependents (Yes/No)

  • Tenure: Number of months with the company

  • PhoneService: Availability of phone service (Yes/No)

  • MultipleLines: Whether multiple lines are active (Yes/No/No phone service)

  • InternetService: Type of internet service (DSL, Fiber optic, None)

  • Value-Added Services: OnlineSecurity, OnlineBackup, DeviceProtection, TechSupport, StreamingTV, StreamingMovies (Yes/No/No internet service)

  • Contract Type: Month-to-month, One year, or Two year

  • PaperlessBilling: Yes/No

  • PaymentMethod: Electronic check, Mailed check, Bank transfer, or Credit card

  • MonthlyCharges: Monthly billing amount

  • TotalCharges: Total charges incurred to date

  • Churn: Customer status (Yes = churned, No = retained)

Key Questions for Exploration

  • Which features are the strongest predictors of churn?

  • Do customers with long-term contracts have lower churn rates?

  • How does the type of internet service influence churn?

  • Is it possible to build a classification model with accuracy above 80%?

4. Apply xAI.

Here we preprocess and prepare input to apply xAI

3. Standard ML process.

Train a model (learn) and test (predict) whether a customer will churn using any kind of nominal classifier. In this case we use Random Forest with a 80/20 split.

2. Data Manipulation/Preparation.

The process below is simplified. Most of the time, data manipulation/preparation involves the use of several nodes such as Missing Value, Row Filter, GroupBy (row aggregation), and Column Aggregator.

1. Read datasets.

Besides the nodes below, which read Excel and CSV files, KNIME offers a wide range of nodes to read different datastet types (e.g., parquet, json, images etc.).

CSV Reader
Oversample churn class at each training sample
SMOTE
Data Preprocessing
Scorer (JavaScript)
Inspect variables. "Churn" column is unbalanced
Data Explorer
Input: 0 : Model as a Workflow Object 1 : Data from Model Test Partition Output: 0 : Global Feature Importance
Global Feature Importance
Exploratory Data Analysis
Automated Visualization
Input: 0 : Model as a Workflow Object 1 : Data from Model Test Partition Output: 0 : Global Feature Importance
Global Feature Importance
Capture Workflow End
Capture Workflow Start
execute up-stream before configuration
AutoML
sampling andselection
Preprocess
Input: 0 : Model as a Workflow Object 1 : Data from Model Test Partition 2 : Single Instance to Explain Output: 0 : Counterfactuals Instances 1 : Local Feature Importance
Local Explanation View
Color Manager
Train Set: 80%Test Set: 20%
Table Partitioner
Random Forest Learner
Random Forest Predictor

Nodes

Extensions

Links