Icon

05 Preprocessing II - solution

Text Mining Course: Preprocessing, Transformation, and Classification Models (solution)

- Create Bag of Words.
- Filter terms that occur in less than 5 documents.

URL: Slides KNIME Analytics Platform Text Mining Course https://www.knime.com/form/material-download-registration

Session 3 - Preprocessing, Transformation, and Classification Models

Solution 05 - Preprocessing II

Learning objective: In this exercise, you will extend the preprocessing steps from the previous exercise and create a bag of words.


Workflow description: This workflow further processes the texts by creating a bag of words and retaining only terms that occur in five or more documents.


You’ll find the instructions for the exercises in the yellow annotations.

Create a bag of words and filter documents

  1. Create Bag of Words using the Bag Of Words Creator node.

  2. Filter out terms that occur in less than 5 documents.

    • Use the Term To String node to extract terms as strings from a bag of words.

    • Use the GroupBy node to group terms and count how many times they appear in pre-processed documents.

    • Filter terms that appear 5 or more times.

    • Use the Reference Row Filter node to filter bag of words and keep only terms that occur in at least 5 documents.


Reading Textual Data
Preprocessing

Enrichment
Your Solution
Filter by number of documents
Row Filter (deprecated)
GroupBy term count documents
GroupBy
Tag Filter
Filter Bag of Words Keep only terms that occur in at least 5 documents
Reference Row Filter
Snowball Stemmer
no missings
Row Filter (deprecated)
Stop Word Filter
Create Bag of Words
Bag Of Words Creator
Case Converter
only documents
Column Filter
Assign POS tags
POS Tagger
Read Tripadvisor data
Table Reader
create documents
Strings to Document
Term to String
Punctuation Erasure
Number Filter

Nodes

Extensions

Links