Icon

Building Sentiment Predictor - Lexicon Based (2)

<p>Building a Sentiment Analysis Predictive Model - Lexicon Based Approach</p><p>This workflow uses a Kaggle Dataset (https://www.kaggle.com/crowdflower/twitter-airline-sentiment) including thousands of customer social media posts towards six US airlines. Contributors annotated the valence of the tweets as positive, negative and neutral. In the lexicon based approach, the number of words with a positive and a negative meaning are counted per post. Based on these numbers, a sentiment score is calculated and used to classify the posts.</p><p><br>If you use this workflow, please cite:
Building a Sentiment Analysis Predictive Model - Lexicon Based Approach

This workflow uses a Kaggle Dataset including thousands of customer social media posts towards six US airlines. Contributors annotated the valence of the tweets as positive, negative and neutral. In the lexicon based approach, the number of words with a positive and a negative meaning are counted per post. Based on these numbers, a sentiment score is calculated and used to classify the posts.

1. Read annotated dataset.

Besides the node to read CSV files below, KNIME provides a wide range of nodes to read different datastet formats (e.g., parquet, json, images etc.).

2. Data Manipulation/Preparation.

Here the most important node is String to Document, which formats sevaral string columns (e.g., author, text, title) into a single document that can be text-mined in KNIME.

3. Use Text Mining to Tag Words with Positive and Negative Meaning based on a Dictionary.

6. Evaluate the Prediction.

4. Count the Number of Positive and Negative Words per Document.

5. Calculate a Sentiment Score based on the Number of Positive and Negative Words and Classify Documents based on the Score.

The sentiment score is calculated by (number of postive words - number of negative words) divided by (number of postive words + number of negative words).

If the score is negative, the post is classified as negative; if the score is positive, it is classified as positive; and if it is equal to 0, the post is classified as neutral.

Scorer
Convert strings to to documents
Strings to Document
Kaggle DatasetSocial media posts fromconsumers to airlines.The posts are fromTwitter from 2015.
CSV Reader
MPQA Dictionary
Excel Reader
Extract sentiment label
Category to Class
Number of Positive and Negative Words per Post
Duplicate Row Filter
Column Filter
negative
Dictionary Tagger
positive
Dictionary Tagger
Calculate Sentiment Score &Predict sentiment based on score
Expression
keep only tagged words
Tag Filter

Nodes

Extensions

Links