Icon

Customer Review Classification with Fuzzy Matching

<p><strong>🔎Customer Review Classification with Fuzzy Matching</strong></p><p>This workflow demonstrates how exorybte's <strong>Approximate String Matcher</strong> node can be used for fuzzy classification of online store product reviews. This simulates a typical scenario where topics have to be extracted from reviews with spelling mistakes.</p><p></p><p>🛠️ Steps in the Workflow</p><p><strong>Step 1: Loading Data</strong></p><ul><li><p>Load a <strong>reference dictionary</strong>, containing keywords and their corresponding categories (e.g. "cheap", "expensive", "affordable" etc. → <em>price</em>)</p></li><li><p>Load a<strong> table of product reviews</strong>, assigning a <strong>unique identifier</strong> to each review.</p></li></ul><p></p><p><strong>Step 2: Preprocessing</strong></p><ul><li><p>Process the reviews by <strong>removing punctuation, tokenizing and ungrouping</strong>.</p></li><li><p>Create a table where <strong>each row represents one word</strong>, along with the full review text and its identifier.</p></li><li><p>This serves as the <strong>basis for matching</strong></p></li></ul><p></p><p><strong>Step 3: Matching</strong></p><ul><li><p>Use the <strong>Approximate String Matcher</strong> (Levenshtein distance) to compare each review word against the reference dictionary.</p></li><li><p>This step captures both <strong>exact matches</strong> (e.g. <em>"premium"</em>) and <strong>approximate matches</strong> (e.g. <em>"solidd", "Esy"</em>).</p></li><li><p>Only <strong>close matches</strong> (distance ≤ 1) are kept in the resulting table.</p></li></ul><p></p><p><strong>Step 4: Postprocessing</strong></p><ul><li><p><strong>Rejoin</strong> the matched results with the original review table so that all reviews appear, even those without any matched words.</p></li><li><p>Add the <strong>category labels </strong>from the dictionary.</p></li></ul><p></p><p><strong>Step 5: Results</strong></p><ul><li><p>Create a <strong>dashboard</strong> summarizing the fuzzy classification results and showing example reviews.</p></li><li><p>Build a <strong>summary table</strong> with one row per review and a list of all detected categories, ready for further analysis..</p></li></ul><p></p>

URL: exorbyte GmbH https://www.exorbyte.com/en

Step 1: Loading Data

  • The workflow starts with two input tables:

    • A reference dictionary with correct terms and their corresponding categories

    • A table containing amazon product reviews

Step 2: Preprocessing

  • Remove punctuation

  • Tokenization

  • Ungrouping to have multiple rows per speech, each with one word, to be matched by the Approximate String Matcher node.

Step 3: Matching

  • Use Approximate String Matcher with Levenshtein distance to detect exact and approximate matches

Step 4: Postprocessing

  • Join the filtered data with all review, ensuring at least one row per review

  • Add the category labels from the dictionary

Step 5: Results

  • Create a dashboard with an overview of the amount of reviews per category and a few examples for a selected Category

  • Create a table with exactly one row for each review with the matching categories in a set.

One row per speech,appropriate categories in a list
GroupBy
Reference dictionary
CSV Reader
Customer reviews
CSV Reader
Results
Levenshtein Matching,filter for close matches(distance ≤ 1)
Approximate String Matcher
Postprocessing
Generate aunique identifierfor each review
RowID
Preprocessing

Nodes

Extensions

Links