Icon

Frequency-Aware Anomaly Detection

<p>This use case demonstrates how the Approximate String Matcher node can be used to detect potential errors or rare entries by <strong>matching the least frequent values against the most frequent ones</strong> in the same dataset.</p><p>Using approximate string matching (e.g., Levenshtein distance), we can distinguish:</p><ul><li><p><strong>Likely typos</strong> — low-frequency entries that closely resemble high-frequency ones</p></li><li><p><strong>Rare but valid</strong> values — dissimilar entries that are truly unique</p></li><li><p><strong>Correct entries</strong> — high-frequency values, often assumed correct</p></li></ul><p>This makes it ideal for:</p><ul><li><p>Detecting entry errors in location, product, or customer data</p></li><li><p>Auto-flagging suspicious or rare strings for review</p></li><li><p>Improving data quality in human-entered datasets</p></li></ul>

URL: exorbyte GmbH https://www.exorbyte.com/en

Nodes

Extensions

Links