Icon

01.Capturing_​Segments_​-_​Sentiment_​Predictor_​-_​Lexicon_​Based - Exercise

Capturing Segments of Sentiment Predictor

In this exercise you'll use integrated deployment to capture segments of a workflow that builds a lexicon-based sentiment analysis predictor. These segments can then be used by a deployable workflow later on.

Session 1 - Preparing to Deploy a WorkflowExercise 01.Capturing_Segments_-_Sentiment_Predictor_-_Lexicon_BasedIn this exercise you'll use integrated deployment to capture segments of a workflow that builds a lexicon-based sentiment analysis predictor. These segments can then be used by adeployable workflow later on. Here we read anannotated twitter datasetcontaining sentiments ofover 14K airline reviewsleft by users on Twitter. Step 1. Capturing the data preprocessing segment of the workflow. Here the most important node is Strings toDocument, which formats several string columns into a single document that can be text-mined in KNIME.1. Drag the Capture Workflow Start node to this part of the workflow. Add an input port of type table by clicking on thethree dots and selecting this option. Connect the output of the CSV Reader node to its input port, and connect its outputto the input port of the Duplicate Row Filter node.2. Drag the Capture Workflow End node to this part of the workflow. Add a port to this node by clicking on the three dotsand selecting this option. Connect the output of the Strings to Document node to its input port. In the nodeconfiguration, set "Custom Workflow Name" as "Data Preparation (Captured Segment)".3. Drag the Workflow Writer node to this part of the workflow. Connect the second output of the Capture Workflow Endnode to its input port. In its configuration, set "Write to" option as "Relative to" and 'Current Workflow"; "Folder" as "../Workflow_Segments"; check option "Use custom workflow name"; and set "Custom Workflow Name" as"01.Captured_Segment_1__Data_Preparation". Click "OK" and execute this node to create a workflow segment thatcaptures this part of the workflow.4. Connect the first output port of the Capture Workflow End node to the input of the following instance of the CaptureWorkflow Start node, already present in this workflow. Join the original data with data to which sentiments were predicted. This join has thegoal of guaranteeing that, in the end, we still have all the documents we had in the beginning-- even if sentiments were not predicted for all of them. Documents with no sentiment words,for example, get filtered out by the predictor, but with this last join we make sure that they arestill part of the output of this workflow (and they will have 'neutral' as associated sentimentclass). Calculate a Sentiment Score based onthe Number of Positive and NegativeWords and Classify Documents basedon the Score. The sentiment score iscalculated by (number of postive words -number of negative words) divided by(number of postive words + number ofnegative words). If the score is negative,the document is classified as negative; ifthe score is positive, it is classified aspositive; and if it is equal to 0, it is classifiedas neutral. Here we capture a metanode that tags words based on their sentiment.Non-tagged words get filtered out in the end. Scorer. Here we use the "Scorer" node tocheck how well our lexicon-basedpredictions match the annotated data. Theperformance is not very good because thisapproach is a bit too simplistic. Step 2. Creating a shared component that counts the number of positive and negative words per document. Thiscomponent encapsulates the counting of sentiment words per document (e.g., tweet), separated by class.1. Pressing 'Shift' and using your mouse's left-click button, select nodes starting from Bag of Words Creator all the wayto the Missing Value node.2. Right-click and select option "Create Component..." and then name it Numbers of Positive and Negative Words perTweet.3. Right-click the newly created component, go to option "Component" and then click option "Share...". Navigate until youfind folder "Components" under "Session 1 - Exercises", choosing it as the location for your component. Select option"Include input data with component" and press 'OK'.4. Again, right-click the newly created component, go to option "Component" and then click option "Change Link Type...".Select option "Create workflow-relative link" and press 'OK'. By documentsvia sentimentsTF absoluteFiltered wordsConvert strings toto documentsKaggle DatasetN=14640Tweets fromconsumers toairlinesStart capturingword taggingsegment intoa workflowEnd capturingworkflow segmentWrite capturedworkflowCreate id columnStart capturingjoiningsegment intoa workflowEnd capturingworkflow segmentWrite capturedworkflow Calculate Scores Tags To String Pivoting TF Bag Of WordsCreator Strings To Document Missing Value DuplicateRow Filter Scorer CSV Reader Column Filter Tag Words asPositive or Negative CaptureWorkflow Start CaptureWorkflow End Workflow Writer Join Sentiment Predictionsand Original Data RowID CaptureWorkflow Start CaptureWorkflow End Workflow Writer Table Validator Session 1 - Preparing to Deploy a WorkflowExercise 01.Capturing_Segments_-_Sentiment_Predictor_-_Lexicon_BasedIn this exercise you'll use integrated deployment to capture segments of a workflow that builds a lexicon-based sentiment analysis predictor. These segments can then be used by adeployable workflow later on. Here we read anannotated twitter datasetcontaining sentiments ofover 14K airline reviewsleft by users on Twitter. Step 1. Capturing the data preprocessing segment of the workflow. Here the most important node is Strings toDocument, which formats several string columns into a single document that can be text-mined in KNIME.1. Drag the Capture Workflow Start node to this part of the workflow. Add an input port of type table by clicking on thethree dots and selecting this option. Connect the output of the CSV Reader node to its input port, and connect its outputto the input port of the Duplicate Row Filter node.2. Drag the Capture Workflow End node to this part of the workflow. Add a port to this node by clicking on the three dotsand selecting this option. Connect the output of the Strings to Document node to its input port. In the nodeconfiguration, set "Custom Workflow Name" as "Data Preparation (Captured Segment)".3. Drag the Workflow Writer node to this part of the workflow. Connect the second output of the Capture Workflow Endnode to its input port. In its configuration, set "Write to" option as "Relative to" and 'Current Workflow"; "Folder" as "../Workflow_Segments"; check option "Use custom workflow name"; and set "Custom Workflow Name" as"01.Captured_Segment_1__Data_Preparation". Click "OK" and execute this node to create a workflow segment thatcaptures this part of the workflow.4. Connect the first output port of the Capture Workflow End node to the input of the following instance of the CaptureWorkflow Start node, already present in this workflow. Join the original data with data to which sentiments were predicted. This join has thegoal of guaranteeing that, in the end, we still have all the documents we had in the beginning-- even if sentiments were not predicted for all of them. Documents with no sentiment words,for example, get filtered out by the predictor, but with this last join we make sure that they arestill part of the output of this workflow (and they will have 'neutral' as associated sentimentclass). Calculate a Sentiment Score based onthe Number of Positive and NegativeWords and Classify Documents basedon the Score. The sentiment score iscalculated by (number of postive words -number of negative words) divided by(number of postive words + number ofnegative words). If the score is negative,the document is classified as negative; ifthe score is positive, it is classified aspositive; and if it is equal to 0, it is classifiedas neutral. Here we capture a metanode that tags words based on their sentiment.Non-tagged words get filtered out in the end. Scorer. Here we use the "Scorer" node tocheck how well our lexicon-basedpredictions match the annotated data. Theperformance is not very good because thisapproach is a bit too simplistic. Step 2. Creating a shared component that counts the number of positive and negative words per document. Thiscomponent encapsulates the counting of sentiment words per document (e.g., tweet), separated by class.1. Pressing 'Shift' and using your mouse's left-click button, select nodes starting from Bag of Words Creator all the wayto the Missing Value node.2. Right-click and select option "Create Component..." and then name it Numbers of Positive and Negative Words perTweet.3. Right-click the newly created component, go to option "Component" and then click option "Share...". Navigate until youfind folder "Components" under "Session 1 - Exercises", choosing it as the location for your component. Select option"Include input data with component" and press 'OK'.4. Again, right-click the newly created component, go to option "Component" and then click option "Change Link Type...".Select option "Create workflow-relative link" and press 'OK'. By documentsvia sentimentsTF absoluteFiltered wordsConvert strings toto documentsKaggle DatasetN=14640Tweets fromconsumers toairlinesStart capturingword taggingsegment intoa workflowEnd capturingworkflow segmentWrite capturedworkflowCreate id columnStart capturingjoiningsegment intoa workflowEnd capturingworkflow segmentWrite capturedworkflow Calculate Scores Tags To String Pivoting TF Bag Of WordsCreator Strings To Document Missing Value DuplicateRow Filter Scorer CSV Reader Column Filter Tag Words asPositive or Negative CaptureWorkflow Start CaptureWorkflow End Workflow Writer Join Sentiment Predictionsand Original Data RowID CaptureWorkflow Start CaptureWorkflow End Workflow Writer Table Validator

Nodes

Extensions

Links