Icon

FINAL1

1. DATA PREPROCESSING AND EXPLORATORY ANALYSIS

2. FEATURE ENGINEERING

MEA 2025/2026 — Group Project — NEW Super Markets International

Group: GROUP BW

Members:

3. Clustering

Product-mix

Customer Value

Channel Use

We reverse the scaling
Denormalizer
Segments profile
GroupBy
Percentage_Canned
Math Formula
Renames cluster_2
String Replacer
Percentage_Beverages
Math Formula
k-Means with k=5
k-Means
Percentage_Frozen
Math Formula
Renames cluster_0
String Replacer
k-Means with k=3
k-Means
Renames cluster_1
String Replacer
Percentage_Perishables
Math Formula
k-Means with k=4
k-Means
Column Filter
Silhouette Coefficient
k-Means with k=3
k-Means
Silhouette Coefficient
Bar Chart
Silhouette Coefficient
Bar Chart
Bar Chart
Column Filter
Silhouette Coefficient
Color Manager
k-Means with k=5
k-Means
Denormalizer
k-Means with k=4
k-Means
k-Means with k=3
k-Means
Silhouette Coefficient
Renames cluster_1
String Replacer
Silhouette Coefficient
Column Filter
GroupBy
k-Means with k=4
k-Means
Renames cluster_0
String Replacer
Silhouette Coefficient
confirms cluster separation
Distance Matrix Calculate
k-Means with k=5
k-Means
Renames cluster_2
String Replacer
Bar Chart
Numerical profile: mean, std, missing counts per variable
Statistics
Stron Skewness detected in Recency
Histogram
We uploaded the dataset
Excel Reader
Table Manipulator
Monetary X income
Scatter Plot
Change the 0 in Education with the most occured variable
String Replacer
Column Filter
CUSTID column was corrected
RowID
Correlation Insights
Linear Correlation
Changes made on Income , Gender and Marital Status
Missing Value
Histogram
Cap the value greater than 100 in internet
Rule Engine
Bar Chart
Numeric Outliers
Table Manipulator
Shows us that Gender contains missing values
Bar Chart
Helps us detect a missing value in Marital Status
Bar Chart
Helps us detect a 0 in Education
Bar Chart
String Replacer
Table View
Monetary is strongly right skewed
Histogram
String Replacer
Color each cluster for further visualisation
Color Manager
Numeric Outliers
For Internet values are above 100 detected (It cant be as its a percentage)
Histogram
String Replacer
Statistics
String Replacer
We reverse the scaling
Denormalizer
give up income share
Column Filter
Color each cluster for further visualisation
Color Manager
String Replacer
confirms cluster separation
Distance Matrix Calculate
Missing Value
Income_Share
Math Formula
Renames cluster_0
String Replacer
Statistics
Excel Writer
Avg_transaction
Math Formula
Segments profile
GroupBy
Statistics
Percentage_Others
Math Formula
Renames cluster_2
String Replacer
Normalizer
Renames cluster_1
String Replacer
Math Formula (Multi Column)
confirms cluster separation
Distance Matrix Calculate
Silhouette Coefficient
Silhouette Coefficient

Nodes

Extensions

Links