Customer churn is a critical challenge in the telecommunications industry, as retaining existing subscribers is far more cost-effective than acquiring new ones. This dataset captures customer-level information that can be used to analyze churn behavior. It includes demographic details, subscription and service usage patterns, billing information, and an indicator of whether the customer has churned.
Dataset Overview
Rows: 7,043
Columns: 21
Key Variables
CustomerID: Unique customer identifier
Gender: Male or Female
SeniorCitizen: 0 = No, 1 = Yes
Partner / Dependents: Whether the customer has a partner or dependents (Yes/No)
Tenure: Number of months with the company
PhoneService: Availability of phone service (Yes/No)
MultipleLines: Whether multiple lines are active (Yes/No/No phone service)
InternetService: Type of internet service (DSL, Fiber optic, None)
Value-Added Services: OnlineSecurity, OnlineBackup, DeviceProtection, TechSupport, StreamingTV, StreamingMovies (Yes/No/No internet service)
Contract Type: Month-to-month, One year, or Two year
PaperlessBilling: Yes/No
PaymentMethod: Electronic check, Mailed check, Bank transfer, or Credit card
MonthlyCharges: Monthly billing amount
TotalCharges: Total charges incurred to date
Churn: Customer status (Yes = churned, No = retained)
Key Questions for Exploration
Which features are the strongest predictors of churn?
Do customers with long-term contracts have lower churn rates?
How does the type of internet service influence churn?
Is it possible to build a classification model with accuracy above 80%?
4. Apply xAI.
Here we preprocess and prepare input to apply xAI
3. Standard ML process.
Train a model (learn) and test (predict) whether a customer will churn using any kind of nominal classifier. In this case we use Random Forest with a 80/20 split.
2. Data Manipulation/Preparation.
The process below is simplified. Most of the time, data manipulation/preparation involves the use of several nodes such as Missing Value, Row Filter, GroupBy (row aggregation), and Column Aggregator.
1. Read datasets.
Besides the nodes below, which read Excel and CSV files, KNIME offers a wide range of nodes to read different datastet types (e.g., parquet, json, images etc.).
To use this workflow in KNIME, download it from the below URL and open it in KNIME:
Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.