Icon

Solution_​Dictionary_​Based

Rule-based Prediction with Dictionary TaggerThis workflow reads the IMDb dataset as well as a list of positive and negative words. After tagging the positive and negative words in the reviews, the workflow uses the Tag Filter node to delete all words which have neither a positive nor a negative sentiment. In the next step, it counts the number of positive and negative words per document (Pivoting node) and calculates the Sentiment Score(#positive-#negative)/(#all words) before it makes the predictions based on the sentiment score. In the last step, it extracts the correct labels to evaluate the model. Preprocessing- Use the Tag Filter node to remove all words whichdon't have a postive ornegative tag Enrichment- Use a Dictionary Tagger to assign thetag type SENTIMENT with the tag valuePOSITIVE to all words on the positive list - Use a Dictionary Tagger to assign thetag type SENTIMENT with the tag valueNEGATIVE to all words of the negative list Remember the upper input is thedocument and the lower input is thedictionary Optional: Use the Document Viewer nodeto visualize the tags Transformation and Frequencies- Use the Bag of Words Creator node to create a bag ofwords- Use the TF node to calculate the absolute termfrequency- Use the Tags to String node to tranform the terms intostringsRemember a term is a tuple of a word and theassigned tag Reading and Parsing- Read the following datasets: IMDb-sample.csv MPQA-OpinionCorpus-PositiveList.csv MPQA-OpinionCorpus-NegativeList.csv(Tip: Drag and drop the dataset from the KNIME Explorer to theWorkflow Editor)(Tip 2: Change the data type of the column Index to string in theconfiguration window)- Use the Strings to Document node to create the documentobjects(Hint: Use the following settings: Title Column = Index Full Text = Text Activate "Use categories from column" and set Document category column = Sentiment)- Use the Column Filter node to delete all columns except thedocument column Optional: Use the Document Viewer node to take a look at thedocuments Classification- Use the Pivoting node to count the number of positive and negative tagged words perdocument(Hint: Group by document, pivot by sentiment and choose sum of the column "TF abs"as manual aggregation)- Use the Missing Value node to set all missing values in a numerical column to 0- Use the Math Formula node to calculate the sentiment score which is defined as (number of postive words - number of negative words)/all words(Hint: In KNIME ($POSITIVE+Sum(TF abs)$ - $NEGATIVE+Sum(TF abs)$)/($POSITIVE+Sum(TF abs)$ + $NEGATIVE+Sum(TF abs)$) )- Use the Rule Engine node to make the predictions based on the sentiment score: $Sentiment Score$ >= 0 =>"POS" TRUE =>"NEG" Evaluation- Use the Category to Class node toextract class information - Use the Scorer node to evaluate the rulebased approach with dictionary tagger Filter all columnsexcept the documentcolumnConvert strings toto documentsExtract sentimentlabelRead IMDb reviewsfrom CSV filePositive listNegative listBy documentsvia sentimentsTF absoluteFiltered wordsSentiment score#Positive-#NegativePredict sentimentbased on scoreAssign positivetagsAssign negativetags Column Filter Strings To Document Category To Class File Reader File Reader File Reader Tags To String Pivoting TF Tag Filter Bag Of WordsCreator Math Formula Rule Engine Scorer Missing Value Dictionary Tagger Dictionary Tagger Rule-based Prediction with Dictionary TaggerThis workflow reads the IMDb dataset as well as a list of positive and negative words. After tagging the positive and negative words in the reviews, the workflow uses the Tag Filter node to delete all words which have neither a positive nor a negative sentiment. In the next step, it counts the number of positive and negative words per document (Pivoting node) and calculates the Sentiment Score(#positive-#negative)/(#all words) before it makes the predictions based on the sentiment score. In the last step, it extracts the correct labels to evaluate the model. Preprocessing- Use the Tag Filter node to remove all words whichdon't have a postive ornegative tag Enrichment- Use a Dictionary Tagger to assign thetag type SENTIMENT with the tag valuePOSITIVE to all words on the positive list - Use a Dictionary Tagger to assign thetag type SENTIMENT with the tag valueNEGATIVE to all words of the negative list Remember the upper input is thedocument and the lower input is thedictionary Optional: Use the Document Viewer nodeto visualize the tags Transformation and Frequencies- Use the Bag of Words Creator node to create a bag ofwords- Use the TF node to calculate the absolute termfrequency- Use the Tags to String node to tranform the terms intostringsRemember a term is a tuple of a word and theassigned tag Reading and Parsing- Read the following datasets: IMDb-sample.csv MPQA-OpinionCorpus-PositiveList.csv MPQA-OpinionCorpus-NegativeList.csv(Tip: Drag and drop the dataset from the KNIME Explorer to theWorkflow Editor)(Tip 2: Change the data type of the column Index to string in theconfiguration window)- Use the Strings to Document node to create the documentobjects(Hint: Use the following settings: Title Column = Index Full Text = Text Activate "Use categories from column" and set Document category column = Sentiment)- Use the Column Filter node to delete all columns except thedocument column Optional: Use the Document Viewer node to take a look at thedocuments Classification- Use the Pivoting node to count the number of positive and negative tagged words perdocument(Hint: Group by document, pivot by sentiment and choose sum of the column "TF abs"as manual aggregation)- Use the Missing Value node to set all missing values in a numerical column to 0- Use the Math Formula node to calculate the sentiment score which is defined as (number of postive words - number of negative words)/all words(Hint: In KNIME ($POSITIVE+Sum(TF abs)$ - $NEGATIVE+Sum(TF abs)$)/($POSITIVE+Sum(TF abs)$ + $NEGATIVE+Sum(TF abs)$) )- Use the Rule Engine node to make the predictions based on the sentiment score: $Sentiment Score$ >= 0 =>"POS" TRUE =>"NEG" Evaluation- Use the Category to Class node toextract class information - Use the Scorer node to evaluate the rulebased approach with dictionary tagger Filter all columnsexcept the documentcolumnConvert strings toto documentsExtract sentimentlabelRead IMDb reviewsfrom CSV filePositive listNegative listBy documentsvia sentimentsTF absoluteFiltered wordsSentiment score#Positive-#NegativePredict sentimentbased on scoreAssign positivetagsAssign negativetags Column Filter Strings To Document Category To Class File Reader File Reader File Reader Tags To String Pivoting TF Tag Filter Bag Of WordsCreator Math Formula Rule Engine Scorer Missing Value Dictionary Tagger Dictionary Tagger

Nodes

Extensions

Links