Icon

Group_​1_​Dictionary_​Based

Reading and Parsing- Read the following datasets: IMDb-sample.csv MPQA-OpinionCorpus-PositiveList.csv MPQA-OpinionCorpus-NegativeList.csv(Tip: Drag and drop the dataset from the KNIME Explorer to theWorkflow Editor)(Tip 2: Change the data type of the column Index to string in theconfiguration window)- Use the Strings to Document node to create the documentobjects(Hint: Use the following settings: Title Column = Index Full Text = Text Activate "Use categories from column" and set Document category column = Sentiment)- Use the Column Filter node to delete all columns except thedocument column Optional: Use the Document Viewer node to take a look at thedocuments Enrichment- Use a Dictionary Tagger to assign thetag type SENTIMENT with the tag valuePOSITIVE to all words on the positive list - Use a Dictionary Tagger to assign thetag type SENTIMENT with the tag valueNEGATIVE to all words of the negative list Remember the upper input is thedocument and the lower input is thedictionary Optional: Use the Document Viewer nodeto visualize the tags Preprocessing- Use the Tag Filter node to remove all words whichdon't have a positive ornegative tag Transformation and Frequencies- Use the Bag of Words Creator node to create a bag ofwords- Use the TF node to calculate the absolute termfrequency- Use the Tags to String node to tranform the terms intostringsRemember a term is a tuple of a word and theassigned tag Classification- Use the Pivoting node to count the number of positive and negative tagged words perdocument(Hint: Group by document, pivot by sentiment and choose sum of the column "TF abs"as manual aggregation)- Use the Missing Value node to set all missing values in a numerical column to 0- Use the Math Formula node to calculate the sentiment score which is defined as (number of postive words - number of negative words)/all words(Hint: In KNIME ($POSITIVE+Sum(TF abs)$ - $NEGATIVE+Sum(TF abs)$)/($POSITIVE+Sum(TF abs)$ + $NEGATIVE+Sum(TF abs)$) )- Use the Rule Engine node to make the predictions based on the sentiment score: $Sentiment Score$ >= 0 =>"POS" TRUE =>"NEG" Evaluation- Use the Category to Class node toextract class information - Use the Scorer node to evaluate the rulebased approach with dictionary tagger Rule-based Prediction with Dictionary TaggerThis workflow reads the IMDb dataset as well as a list of positive and negative words. After tagging the positive and negative words in the reviews, the workflow uses the Tag Filter node to delete all words which have neither a positive nor a negative sentiment. In the next step, it counts the number of positive and negative words per document (Pivoting node) and calculates the Sentiment Score(#positive-#negative)/(#all words) before it makes the predictions based on the sentiment score. In the last step, it extracts the correct labels to evaluate the model. Reading and Parsing- Read the following datasets: IMDb-sample.csv MPQA-OpinionCorpus-PositiveList.csv MPQA-OpinionCorpus-NegativeList.csv(Tip: Drag and drop the dataset from the KNIME Explorer to theWorkflow Editor)(Tip 2: Change the data type of the column Index to string in theconfiguration window)- Use the Strings to Document node to create the documentobjects(Hint: Use the following settings: Title Column = Index Full Text = Text Activate "Use categories from column" and set Document category column = Sentiment)- Use the Column Filter node to delete all columns except thedocument column Optional: Use the Document Viewer node to take a look at thedocuments Enrichment- Use a Dictionary Tagger to assign thetag type SENTIMENT with the tag valuePOSITIVE to all words on the positive list - Use a Dictionary Tagger to assign thetag type SENTIMENT with the tag valueNEGATIVE to all words of the negative list Remember the upper input is thedocument and the lower input is thedictionary Optional: Use the Document Viewer nodeto visualize the tags Preprocessing- Use the Tag Filter node to remove all words whichdon't have a positive ornegative tag Transformation and Frequencies- Use the Bag of Words Creator node to create a bag ofwords- Use the TF node to calculate the absolute termfrequency- Use the Tags to String node to tranform the terms intostringsRemember a term is a tuple of a word and theassigned tag Classification- Use the Pivoting node to count the number of positive and negative tagged words perdocument(Hint: Group by document, pivot by sentiment and choose sum of the column "TF abs"as manual aggregation)- Use the Missing Value node to set all missing values in a numerical column to 0- Use the Math Formula node to calculate the sentiment score which is defined as (number of postive words - number of negative words)/all words(Hint: In KNIME ($POSITIVE+Sum(TF abs)$ - $NEGATIVE+Sum(TF abs)$)/($POSITIVE+Sum(TF abs)$ + $NEGATIVE+Sum(TF abs)$) )- Use the Rule Engine node to make the predictions based on the sentiment score: $Sentiment Score$ >= 0 =>"POS" TRUE =>"NEG" Evaluation- Use the Category to Class node toextract class information - Use the Scorer node to evaluate the rulebased approach with dictionary tagger Rule-based Prediction with Dictionary TaggerThis workflow reads the IMDb dataset as well as a list of positive and negative words. After tagging the positive and negative words in the reviews, the workflow uses the Tag Filter node to delete all words which have neither a positive nor a negative sentiment. In the next step, it counts the number of positive and negative words per document (Pivoting node) and calculates the Sentiment Score(#positive-#negative)/(#all words) before it makes the predictions based on the sentiment score. In the last step, it extracts the correct labels to evaluate the model.

Nodes

  • No nodes found

Extensions

  • No modules found

Links