Icon

KNIME_​project

Conclusion

After training and evaluating all three machine learning models on the banking customer dataset, the Random Forest model emerged as the best performing model with the highest accuracy of approximately 88% and the highest AUC score of approximately 0.87.

Random Forest outperformed the other models because it is an ensemble learning method that combines multiple decision trees, which reduces overfitting and handles class imbalance better. It is also capable of capturing complex non-linear relationships between features which are commonly found in banking customer data.

The churn prediction model can help banks identify customers who are at risk of leaving and take proactive retention measures such as offering special discounts, personalized services, or loyalty programs — ultimately reducing customer attrition and increasing profitability.

Results and Model Comparison

Model Accuracy AUC Score

Logistic Regression ~81% ~0.77

Decision Tree ~86% ~0.83

Random Forest ~88% ~0.87

Load banking dataset.10,000 rows | 14 columns
CSV Reader
Statistical summary of all columns.Min, Max, Mean, Std Dev.
Statistics
DT Accuracy + Confusion Matrix. (~86%)
Scorer
Convert Exited column: Integer -String
Number to String
Color by Exited: 0=Blue | 1=Red
Color Manager
Visualize churn distribution.Not Churned (0): ~7963 | Churned(1): ~2037
Bar Chart
Visualize customer geography distribution.France: ~50% | Germany: ~25% | Spain: ~25%
Pie Chart
Remove irrelevant columns before modeling.Excluded: RowNumber, CustomerId, SurnameThese columns do not contribute to prediction.Remaining: 11 feature columns + Exited
Column Filter
RF Accuracy + Confusion Matrix. (~88%)
Scorer
Spply Logistic Regression model on Test data.Port 1: LR Model from LR LearnerPort 2: Test data (20%) from Partitioning Port 2Output: Prediction (Exited) column added
Logistic Regression Predictor
Split dataset into Training and Test sets.Method: Relative | Ratio: 80% Train / 20% TestTraining set: ~8000 rows | Test set: ~2000 rows
Table Partitioner
Apply Decision Tree model on Test data.Port 1: DT Model from DT LearnerPort 2: Test data (20%) from Partitioning Port 2Output: Prediction (Exited) column added
Decision Tree Predictor
Train Logistic Regression model.Input: Training data (80%) from Partitioning Port 1Algorithm: Iteratively Reweighted Least Squares
Logistic Regression Learner
Apply Random Forest model on Test data.Port 1: RF Model from RF LearnerPort 2: Test data (20%) from Partitioning Port 2Output: Prediction (Exited) column added
Random Forest Predictor
Train Decision Tree model.Input: Training data (80%) from Partitioning Port 1Quality measure: Gini Index | Max depth: 5
Decision Tree Learner
Train Random Forest model.Input: Training data (80%) from Partitioning Port 1Number of trees: 100 | Criterion: Gini
Random Forest Learner
ROC Curve
Evaluate Logistic Regression modelExpected Accuracy: ~81%
Scorer
ROC Curve
ROC Curve

Nodes

Extensions

Links