Icon

04 Preprocessing - solution

Text Mining Course: Enrichment and Preprocessing (solution)

- Filter numbers, punctuation marks, stop words.
- Convert texts to lower case.
- Perform stemming.
- Keep only tokens tagged as nouns, verbs, and adjectives.

URL: Slides KNIME Analytics Platform Text Mining Course https://www.knime.com/form/material-download-registration

Session 2 - Enrichment and Preprocessing

Solution 04 - Preprocessing

Learning objective: In this exercise, you will practice preprocessing text data.


Workflow description: This workflow performs a series of preprocessing operations to clean, filter and normalize texts.


You’ll find the instructions for the exercises in the yellow annotations.

Preprocess texts by cleaning it

  1. Filter numbers, punctuation marks, stop words with the Number Filter, Punctuation Erasure and Stop Word Filter node, respectively.

  2. Convert texts to lower case using the Case Converter node.

  3. Perform stemming with the Snowball Stemmer node.

  4. Keep only tokens tagged as nouns, verbs, and adjectives using the Tag Filter node.


Enrichment
Your Solution

Reading Textual Data
Assign POS tags
POS Tagger
Only documents
Column Filter
Stop Word Filter
No missings
Row Filter (deprecated)
Snowball Stemmer
Read Tripadvisor data
Table Reader
Case Converter
Ceate documents
Strings to Document
Punctuation Erasure
Tag Filter
Number Filter

Nodes

Extensions

Links