Icon

07_​Sentiment_​Classification_​with_​NGrams

Sentiment Analysis (Classification) of Documents with NGram Features
1-gram features 1- and 2-gram features Sentiment Analysis (Classification) of Documents with NGram Features The workflow reads textual data from a csv file and converts the strings into documents. The documents are thenpreprocessed, i.e. filtered and stemmed. The preprocessing magic takes place in the Preprocessing meta node. In theFeature Creation meta node two kinds of feature sets and document vectors are created. The top set of vectorscontains only single word features the bottom set of vectors contains single word and 2-gram features. After the document vectors have been created the sentiment class is extracted and two predictive models are built andscored. One model based only on single word features and the second model based on single word and 2-gramfeatures. Bothe models are compared in the ROC curve node. Color by sentimentlabelRead IMDb reviewsfrom CSV filePreprocessing of documentsTransformation of strings to documentsExtract sentimentlabelCreation of document vectorsof frequent 1grams and 2gramsTraining / test setApply decisiontree modelScore decisiontree modelScore decisiontree modelsTraining / test setExtract sentimentlabelColor by sentimentlabelApply decisiontree modelScore decisiontree modelJoin classprobabilitiesNode 308Node 309 Color Manager File Reader Preprocessing Document Creation Category To Class Feature Creation Partitioning Decision TreePredictor Scorer ROC Curve Partitioning Category To Class Color Manager Decision TreePredictor Scorer Joiner DecisionTree Learner DecisionTree Learner 1-gram features 1- and 2-gram features Sentiment Analysis (Classification) of Documents with NGram Features The workflow reads textual data from a csv file and converts the strings into documents. The documents are thenpreprocessed, i.e. filtered and stemmed. The preprocessing magic takes place in the Preprocessing meta node. In theFeature Creation meta node two kinds of feature sets and document vectors are created. The top set of vectorscontains only single word features the bottom set of vectors contains single word and 2-gram features. After the document vectors have been created the sentiment class is extracted and two predictive models are built andscored. One model based only on single word features and the second model based on single word and 2-gramfeatures. Bothe models are compared in the ROC curve node. Color by sentimentlabelRead IMDb reviewsfrom CSV filePreprocessing of documentsTransformation of strings to documentsExtract sentimentlabelCreation of document vectorsof frequent 1grams and 2gramsTraining / test setApply decisiontree modelScore decisiontree modelScore decisiontree modelsTraining / test setExtract sentimentlabelColor by sentimentlabelApply decisiontree modelScore decisiontree modelJoin classprobabilitiesNode 308Node 309 Color Manager File Reader Preprocessing Document Creation Category To Class Feature Creation Partitioning Decision TreePredictor Scorer ROC Curve Partitioning Category To Class Color Manager Decision TreePredictor Scorer Joiner DecisionTree Learner DecisionTree Learner

Nodes

Extensions

Links