Fuzzy Category Cleaner - Preparing Categorical Data for Machine Learning

🧹 Cleaning Noisy Categories for ML This workflow demonstrates how to clean categorical labels before training a machine learning model.Real-world datasets often contain inconsistent or misspelled category values (e.g., Logiystics, Eduzcation, Healthcar). If used directly, these noisy labels fragment the data and reduce model accuracy.🔑 Steps in this workflow:<ol><li>📂 Load Product Sales Data – dataset with features: Units Sold, Purchase Probability, Sales Channel, and noisy Category.</li><li>🏷️ Reference Category Labels – define the valid set of canonical categories (Electronics, Logistics, Education, Healthcare, Finance).</li><li>🔍 Approximate String Matcher – apply Levenshtein distance to align noisy category values with their closest valid label.</li></ol>✅ Result: A cleaned dataset where all category labels are consistent and ML-ready.

URL: exorbyte GmbH https://www.exorbyte.com/en

Nodes

Extensions

Download

To use this workflow in KNIME, download it from the below URL and open it in KNIME:

Download Workflow

Created by: Ahmad.Varasteh

Created at: 2025-08-18

On NodePit since: 2025-08-22

Last update: 2026-03-17

Created with KNIME version: v5.8.2

Tags: data-cleaningfuzzy-matchingapproximate-matchingcategory-cleaningmachine-learning-preprocessingdata-preparationexorbyte

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!

📂 Dataset Overview

🏷️ Reference Category Labels

🔍 Term Matcher

🔐 How to Get Your License