Icon

Group_​4

<p>Group 4 tree based ML models for churn prediction</p>

Group 4 tree based ML models for churn prediction

This workflow implements a complete machine learning pipeline for customer churn risk prediction following the CRISP-DM methodology. Starting from raw customer data, the pipeline performs data cleaning, feature engineering, and one-hot encoding before partitioning the dataset into a 70% training set and 30% test set using stratified sampling. Three tree-based classification models are trained and evaluated: Decision Tree, Random Forest, and XGBoost. Each model incorporates a Parameter Optimization Loop to systematically identify the best hyperparameter configuration before final evaluation on the held-out test set. Model performance is assessed using Accuracy, Macro F1, and Weighted F1, with results consolidated into an Algorithm Score Table for direct comparison across all three models.

Splits data 70% train / 30% test. Stratified on churn_risk_score to preserve class distribution. Fixed seed = 0 for reproducibility.
Table Partitioner
Loads raw train.csv.
CSV Reader
To append model name
Constant Value Column Appender
Decision Tree | Gain Ratio, MDL Pruning | Test Accuracy: 79.1% | Macro F1: 0.769 | Weighted F1: 0.773
Decision tree
XGBoost | Random Search, 6 Parameters | Test Accuracy: 78.2% | Macro F1: 0.772 | Weighted F1: 0.778
XG Boost
To filter extra columns
Column Filter
Cleans and prepares raw data — removes invalid rows, imputes missing values, encodes categoricals, and engineers features
Data cleansing
To filter extra columns
Column Filter
To append model name
Constant Value Column Appender
To filter extra columns
Column Filter
Random Forest | 100 Trees, Unlimited Depth | Test Accuracy: 77.6% | Macro F1: 0.762 | Weighted F1: 0.769
Random forest
To append model name
Constant Value Column Appender
Checks for missing values and outliers, and explores distribution and relationship between attributes
Data exploration

Nodes

Extensions

Links