Icon

09 Classification III - solution

Text Mining Course: Preprocessing, Transformation, and Classification Models (bonus solution)

- Supervised learning: Build a sentiment predictor from scratch.
- Supervised learning: Build a sentiment predictor with transfer learning.

URL: Slides KNIME Analytics Platform Text Mining Course https://www.knime.com/form/material-download-registration

Session 3 - Preprocessing, Transformation, and Classification Models

Solution 09 - Sentiment Predictor (Bonus)

Learning objective: In this exercise, you will further practice performing classification on an IMDB data set.


Workflow description: This workflow predicts the sentiment of movie reviews using two different supervised approaches: training a ML model from scratch, and using transfer learning.


You’ll find the instructions for the exercises in the yellow annotations.

Activity 1. Supervised learning: Build a sentiment predictor from scratch

  1. Create document objects out of movie reviews.

  2. Preprocess documents.

    • Remove numbers, characters, punctuation, stop words, etc.

    • Convert text to lower case.

    • Perform stemming.

  3. Shuffle the dataset with the Shuffle node and split it into training (70%) and test set (30%) with stratified sampling on the target column.

  4. Create a bag of words and compute relative term frequency for the training and test set separately.

  5. Create document vectors for the training and test set.

  6. Extract sentiment class label using the Category To Class node.

  7. Train a Decision Tree model and score it using the Scorer and the ROC Curve nodes.


Your Solution
Activity 2. Supervised learning: Build a sentiment predictor with transfer learning

  1. Retain only movie reviews and target column.

  2. Shuffle the dataset and split it into training (70%) and test set (30%) with stratified sampling on the target column.

  3. Lowercase text reviews.

  4. Use the BERT Model Selector node to download the pretrained bert_en_uncased_L-12_H-768_A-12 from TensorFlow Hub.

  5. Fine-tune the pretrained model with the BERT Classification Learner node. Set max sequence length to 128, no. of epochs to 2, batch size to 128, validation to 20, optimizer to Adam and learning rate to 0,00001.

  6. Apply the fine-tuned model with the BERT Predictor node and score it.


Your Solution
Read imdbsample data
CSV Reader
Read imdbsample data
CSV Reader
Keep onlytext and target
Column Filter
Training / test set
Table Partitioner
Transformation of strings to documents
Document Creation
Transform
Document Vector Applier
ROC Curve
Create bag of wordsand compute relativeterm frequency
Preprocessing II
Select anddownload modelfrom TensorFlow Hub
BERT Model Selector
Shuffle
Apply fine-tuned model
BERT Predictor
Fine-tune pretrained model
BERT Classification Learner
Shuffle
Score model
Scorer
Preprocessing of documents
Preprocessing
Transform documentsinto vectors
Document Vector
Decision Tree Learner
Score decisiontree model
Scorer
Create bag of wordsand compute relativeterm frequency
Preprocessing II
Extract sentiment label
Category to Class
Training / test set
Table Partitioner
Extract sentiment label
Category to Class
Apply decision tree model
Decision Tree Predictor
To lowercase
String Manipulation

Nodes

Extensions

Links