Icon

08 Classification II - solution

Text Mining Course: Preprocessing, Transformation, and Classification Models (bonus solution)

-Create document vectors for the second set of documents. The feature space of the second set of documents has to be identical to the feature set of the first set of documents.
-Extract class labels/category for prediction, and assign colors.
-Apply the trained model on the second set of documents.

URL: Slides KNIME Analytics Platform Text Mining Course https://www.knime.com/form/material-download-registration

Session 3 - Preprocessing, Transformation, and Classification Models

Solution 08 - Text Classification II (Bonus)

Learning objective: In this exercise, you will further practice performing classification on a different data set.


Workflow description: This workflow imports two different datasets, enriches, preprocesses and transforms them (the feature space of the second set of documents has to be identical to the feature set of the first set of documents) before training a simple classifier.


You’ll find the instructions for the exercises in the yellow annotations.

Perform classification task on a different data set

  1. Create document vectors for the second set of documents. The feature space of the second set of documents has to be identical to the feature set of the first set of documents.

    • Create a bag of words, compute relative term frequency and create identical document vectors using the Document Vector Applier node.

  2. Extract class labels/category for prediction, and assign colors.

  3. Apply the trained model on the second set of documents.


Reading Textual Data
Enrichment
Preprocessing

Transformation
Classification
Your Solution
Filtering based on occurrences
Preprocessing II
Filtering, Stemming, ...
Preprocessing I
Convert strings to documents
Document Creation
Adapt feature set of bow input to reference document vectors
Document Vector Applier
Append colors
Color Manager
Extract category for prediction (class label)
Document Data Extractor
Filter document column
Column Filter
Tagging (POS)
Enrichment
Compute relative term frequency
TF
Filtering, Stemming, ...
Preprocessing I
Create Bag of Words
Bag Of Words Creator
Read Tripadvisor data Boston Reviews
Table Reader
Apply model
Decision Tree Predictor
Read Tripadvisor data San Francisco Reviews
Table Reader
Append colors
Color Manager
Creation of document vectors
Transformation
Decision Tree Learner
Tagging (POS)
Enrichment
Convert strings to documents
Document Creation

Nodes

Extensions

Links