Feature Engineering & Anomaly Flagging
[1] Derived Variables (6 new columns)
• AvgTicket = Monetary / Frequency
(basket size; Frequency=0 → 0, 10 records)
• Perishables_Pct, Beverages_Pct, Frozen_Pct,
Canned_Pct, Others_Pct (= LoB / Monetary)
Purpose: distinguish "small basket frequent" vs
"large basket occasional" shoppers; capture
category preference independent of total spend.
[2] Coherence Flags (~360 distinct records, retained)
• Flag_FreqZero: 10 records
• Flag_Recency_GT365: 26 records
• Flag_Young_DivWidow: 2 records
• Flag_Negative_LoB: 322 records (returns)
Note on outliers
Statistical outliers in RFM and AvgTicket are not flagged separately. These represent valid high-value customer behavior (e.g. VIPs, heavy buyers) and are intended to form distinct segments in clustering rather than to be corrected. Box plots in the EDA section provide visual outlier inspection.