Icon

20230509 Pikairos JustKNIMEIt Season 2 Challenge 6 Airline Reviews

You work for a Marketing agency that monitors the online presence of a few airline companies to understand how they are being reviewed. You were asked to identify whether a tweet mentioning an airline is positive, neutral, or negative, and decided to implement a simple sentiment analysis classifier for this task. What accuracy can you get when automating this process? Is the classifier likely to help company reviewers save their time? Note: Given the size of the dataset, training the classifier may take a little while to execute on your machine (especially if you use more sophisticated methods). Feel free to use only a part of the dataset in this challenge if you want to speed up your solution. Hint 1: Check our Textprocessing extension to learn more about how you can turn tweets' words into features that a classifier can explore. Hint 2: Study, use, and/or adapt shared components Enrichment and Preprocessing and Document Vectorization (in this order!) if you want to get a part of the work done more quickly. They were created especially for this challenge. Hint 3: Remember to partition the dataset into training and test set in order to create the decision tree model and then evaluate it. Feel free to use the partitioning strategy you prefer.

Challenge 06: Airline Reviews Description: You work for a Marketing agency that monitors the online presence of a few airline companies to understand how they are being reviewed. You were asked to identify whether a tweet mentioningan airline is positive, neutral, or negative, and decided to implement a simple sentiment analysis classifier for this task.What accuracy can you get when automating this process?Using a Logistic Regression Learner, it is possible to create a model with around 75% accuracy.Is the classifier likely to help company reviewers save their time?When dealing with something that is not crucial, a model having an accuracy of 75% is acceptable. However, it is important to assess the performance of the model. In this solution there is highspecificity for all 3 classes and reasonable to high sensitivity for the positive and negative classes, where as the neutral class has a sensitivity around 0.5 and is therefore pretty much random.Note: Given the size of the dataset, training the classifier may take a little while to execute on your machine (especially if you use more sophisticated methods). Feel free to use only a part of the dataset in thischallenge if you want to speed up your solution. Hint 1: Check our Textprocessing extension to learn more about how you can turn tweets' words into features that a classifier can explore. Hint 2: Study, use,and/or adapt shared components Enrichment and Preprocessing and Document Vectorization (in this order!) if you want to get a part of the work done more quickly. They were created especially for thischallenge. Hint 3: Remember to partition the dataset into training and test set in order to create the decision tree model and then evaluate it. Feel free to use the partitioning strategy you prefer. TweetsCalculateAccuracy StatisticsStart of CrossValidation LoopNumber of Validations = 10Random SamplingRenameCategory Columnto ClassConcatenateResults FromEach Iteration Table Reader DocumentVectorization Enrichment andPreprocessing Scorer X-Partitioner Column Rename LogisticRegression Learner Logistic RegressionPredictor Loop End Challenge 06: Airline Reviews Description: You work for a Marketing agency that monitors the online presence of a few airline companies to understand how they are being reviewed. You were asked to identify whether a tweet mentioningan airline is positive, neutral, or negative, and decided to implement a simple sentiment analysis classifier for this task.What accuracy can you get when automating this process?Using a Logistic Regression Learner, it is possible to create a model with around 75% accuracy.Is the classifier likely to help company reviewers save their time?When dealing with something that is not crucial, a model having an accuracy of 75% is acceptable. However, it is important to assess the performance of the model. In this solution there is highspecificity for all 3 classes and reasonable to high sensitivity for the positive and negative classes, where as the neutral class has a sensitivity around 0.5 and is therefore pretty much random.Note: Given the size of the dataset, training the classifier may take a little while to execute on your machine (especially if you use more sophisticated methods). Feel free to use only a part of the dataset in thischallenge if you want to speed up your solution. Hint 1: Check our Textprocessing extension to learn more about how you can turn tweets' words into features that a classifier can explore. Hint 2: Study, use,and/or adapt shared components Enrichment and Preprocessing and Document Vectorization (in this order!) if you want to get a part of the work done more quickly. They were created especially for thischallenge. Hint 3: Remember to partition the dataset into training and test set in order to create the decision tree model and then evaluate it. Feel free to use the partitioning strategy you prefer. TweetsCalculateAccuracy StatisticsStart of CrossValidation LoopNumber of Validations = 10Random SamplingRenameCategory Columnto ClassConcatenateResults FromEach Iteration Table Reader DocumentVectorization Enrichment andPreprocessing Scorer X-Partitioner Column Rename LogisticRegression Learner Logistic RegressionPredictor Loop End

Nodes

Extensions

Links