Icon

Fuzzy Category Cleaner - Preparing Categorical Data for Machine Learning

<p><strong>🧹 Cleaning Noisy Categories for ML</strong></p><p><br>This workflow demonstrates how to <strong>clean categorical labels</strong> before training a machine learning model.</p><p>Real-world datasets often contain <strong>inconsistent or misspelled category values</strong> (e.g., Logiystics, Eduzcation, Healthcar). If used directly, these noisy labels fragment the data and reduce model accuracy.</p><p>🔑 <strong>Steps in this workflow:</strong></p><ol><li><p>📂 <strong>Load Product Sales Data</strong> – dataset with features: Units Sold, Purchase Probability, Sales Channel, and noisy Category.</p></li><li><p>🏷️ <strong>Reference Category Labels</strong> – define the valid set of canonical categories (Electronics, Logistics, Education, Healthcare, Finance).</p></li><li><p>🔍 <strong>Approximate String Matcher</strong> – apply Levenshtein distance to align noisy category values with their closest valid label.</p></li></ol><p>✅ <strong>Result:</strong> A cleaned dataset where all category labels are consistent and ML-ready.</p>

URL: exorbyte GmbH https://www.exorbyte.com/en

Nodes

Extensions

Links