Icon

07 Classification - solution

Text Mining Course: Preprocessing, Transformation, and Classification Models (solution)

- Append color information based on class labels.
- Split data into training and test set.
- Train Decision Tree classifier on training set.
- Apply trained model on test set.
- Score model.

URL: Slides KNIME Analytics Platform Text Mining Course https://www.knime.com/form/material-download-registration

Session 3 - Preprocessing, Transformation, and Classification Models

Solution 07 - Text Classification

Learning objective: In this exercise, you will practice performing classification using Machine Learning on your transformed text data.


Workflow description: This workflow assigns colors to class labels, partitions transformed texts into train and test sets, trains and scores a simple classifier.


You’ll find the instructions for the exercises in the yellow annotations.

Decision Trees to classify texts

  1. Append color information based on class labels using the Color Manager node.

  2. Filter out Document column.

  3. Split data into training (70%) and test (30%) set with stratified sampling on the target column using the Partitioning node.

  4. Train Decision Tree classifier on the training set with the Decision Tree Leaner node.

  5. Apply the trained model to the test set with the Decision Tree Predictor node.

  6. Score model with the Scorer node.


Reading Textual Data
Enrichment
Preprocessing

Preprocessing II

Transformation
Your Solution

Create documents
Strings to Document
Extract category for prediction (class label)
Document Data Extractor
No missings
Row Filter (deprecated)
Append colors
Color Manager
Assign POS tags
POS Tagger
Create document vectors
Document Vector
Create Bag of Words
Bag Of Words Creator
Only documents
Column Filter
Score model
Scorer
Stop Word Filter
Term to String
Filter by number of documents
Row Filter (deprecated)
GroupBy term count documents
GroupBy
Filter Bag of Words Keep only terms that occur in at least 5 documents
Reference Row Filter
Filter document column
Column Filter
Split into training and test set
Table Partitioner
Apply model
Decision Tree Predictor
Snowball Stemmer
Case Converter
Punctuation Erasure
Tag Filter
Read Tripadvisor data
Table Reader
Decision Tree Learner
Compute relative term frequency
TF
Number Filter

Nodes

Extensions

Links