Icon

Building Sentiment Predictor --Exercise

Building Sentiment Predictor Exercise
2. Data Manipulation/Preparation. Simplified datamanipulation process. Users working on text miningmight want to add a "Spell Checker" node to handlegrammar issues (e.g., happy = hapy). Here the mostimportant node is "strings to document", which formatsseveral string columns (e.g., author, text, title) into asingle document that can be text-mined in KNIME. 3. Use of Text Mining to Transform Data into Numbers. Enrichment means using dictionaries (e.g.,LIWC) to tag words into determined categories. This might serve for purposes such as making surethese words are not removed, or to create intensity measures (word category percentages) perdocument. Preprocessing allows users to simplify the analysis by (1) removing punctuation andstopwords, (2) performing steeming, and (3) executing other preprocessing tasks. Based on thepreprocessed documents, a document vector is created to represent each document in a vector space.This blog post describes different document encoding options https://www.knime.com/blog/text-encoding-a-review 1. Read annotatedtwitter dataset.Besides the nodeto read CSV filesbelow, KNIMEprovides a widerange of nodes toread differentdatastet formats(e.g., parquet,json, images etc.). Building a Sentiment Analysis Predictive Model - Supervised Machine Learning -- EXERCISE This workflow uses a Kaggle Dataset including 14K customer tweets towards six US airlines (https://www.kaggle.com/crowdflower/twitter-airline-sentiment). Contributorsannotated the valence of the tweets as positive, negative or neutral. Once users are satisfied with the model evaluation, they should export (1) the Vector Space and (2) the TrainedModel for deployment over non-annotated data. Your task here is to train different models and save them for deployment on non-annotated data. Try and Test different Machine Learning Models:Use the train and test set from partition node to train at least three different classifiers and compute its accuracyand time to taken to train each model. Feel free to also try Hyperparameter Optimization on models if time and resoources permits.. FIRST MODEL- Drag a suitable Learner node and connect first output port of Partitioning node. - Connect the output from second port in Predictor node. Connect model input port with model output port from the Learner.- Train the model and test it on corresponding Predictor node.- Measure performance using Scorer Node SECOND MODEL- Drag a suitable Learner node and connect first output port of Partitioning node. - Connect the output from second port in Predictor node. Connect model input portwith model output port from the Learner.- Train the model and test it on corresponding Predictor node.- Measure performance using Scorer Node THIRD MODEL- Drag a suitable Learner node and connect first output port of Partitioning node. - Connect the output from second port in Predictor node. Connect model input portwith model output port from the Learner.- Train the model and test it on corresponding Predictor node.- Measure performance using Scorer Node.. EXECUTION TIME Can you think of an apptopriate node/process to measure total execution time inhours? Solution will be shared one weekafter the webinar. Common steps(e.g., lowercase,stopwords, etc.)Transform words to termfrequency orother measuresConvert strings toto documentsExtract Sentimentannotation from the metadata in the document column80/20excludedocumentcolumnExport Vector Spacefor deploymentDrag and dropKaggle DatasetN=14640Tweets fromconsumers toairlines Enrichment andPreprocessing BoW andVector Space Strings To Document DuplicateRow Filter Document DataExtractor Partitioning Column Filter Model Writer CSV Reader 2. Data Manipulation/Preparation. Simplified datamanipulation process. Users working on text miningmight want to add a "Spell Checker" node to handlegrammar issues (e.g., happy = hapy). Here the mostimportant node is "strings to document", which formatsseveral string columns (e.g., author, text, title) into asingle document that can be text-mined in KNIME. 3. Use of Text Mining to Transform Data into Numbers. Enrichment means using dictionaries (e.g.,LIWC) to tag words into determined categories. This might serve for purposes such as making surethese words are not removed, or to create intensity measures (word category percentages) perdocument. Preprocessing allows users to simplify the analysis by (1) removing punctuation andstopwords, (2) performing steeming, and (3) executing other preprocessing tasks. Based on thepreprocessed documents, a document vector is created to represent each document in a vector space.This blog post describes different document encoding options https://www.knime.com/blog/text-encoding-a-review 1. Read annotatedtwitter dataset.Besides the nodeto read CSV filesbelow, KNIMEprovides a widerange of nodes toread differentdatastet formats(e.g., parquet,json, images etc.). Building a Sentiment Analysis Predictive Model - Supervised Machine Learning -- EXERCISE This workflow uses a Kaggle Dataset including 14K customer tweets towards six US airlines (https://www.kaggle.com/crowdflower/twitter-airline-sentiment). Contributorsannotated the valence of the tweets as positive, negative or neutral. Once users are satisfied with the model evaluation, they should export (1) the Vector Space and (2) the TrainedModel for deployment over non-annotated data. Your task here is to train different models and save them for deployment on non-annotated data. Try and Test different Machine Learning Models:Use the train and test set from partition node to train at least three different classifiers and compute its accuracyand time to taken to train each model. Feel free to also try Hyperparameter Optimization on models if time and resoources permits.. FIRST MODEL- Drag a suitable Learner node and connect first output port of Partitioning node. - Connect the output from second port in Predictor node. Connect model input port with model output port from the Learner.- Train the model and test it on corresponding Predictor node.- Measure performance using Scorer Node SECOND MODEL- Drag a suitable Learner node and connect first output port of Partitioning node. - Connect the output from second port in Predictor node. Connect model input portwith model output port from the Learner.- Train the model and test it on corresponding Predictor node.- Measure performance using Scorer Node THIRD MODEL- Drag a suitable Learner node and connect first output port of Partitioning node. - Connect the output from second port in Predictor node. Connect model input portwith model output port from the Learner.- Train the model and test it on corresponding Predictor node.- Measure performance using Scorer Node.. EXECUTION TIME Can you think of an apptopriate node/process to measure total execution time inhours? Solution will be shared one weekafter the webinar. Common steps(e.g., lowercase,stopwords, etc.)Transform words to termfrequency orother measuresConvert strings toto documentsExtract Sentimentannotation from the metadata in the document column80/20excludedocumentcolumnExport Vector Spacefor deploymentDrag and dropKaggle DatasetN=14640Tweets fromconsumers toairlines Enrichment andPreprocessing BoW andVector Space Strings To Document DuplicateRow Filter Document DataExtractor Partitioning Column Filter Model Writer CSV Reader

Nodes

Extensions

Links