Icon

01.Capturing_​Segments_​-_​Sentiment_​Predictor_​-_​Lexicon_​Based - Exercise

Capturing Segments of Sentiment Predictor - Exercise

In this exercise you'll use integrated deployment to capture segments of a workflow that builds a lexicon-based sentiment analysis predictor. These segments can then be used by a deployable workflow later on.

Session 1 - Preparing to Deploy a WorkflowExercise 01.Capturing_Segments_-_Sentiment_Predictor_-_Lexicon_BasedIn this exercise you'll use integrated deployment to capture segments of a workflow that builds a lexicon-based sentiment analysispredictor. These segments can then be used by a deployable workflow later on. Here we read anannotated twitterdataset containingsentiments of over14K airline reviewsleft by users onTwitter. Step 1. Capturing the data preprocessing segment of the workflow. Here the mostimportant node is Strings to Document, which formats several string columns into asingle document that can be text-mined in KNIME.1. Drag the Capture Workflow Start node to this part of the workflow. Add an input port oftype table by clicking on the three dots and selecting this option. Connect the output of theCSV Reader node to its input port, and connect its output to the input port of the DuplicateRow Filter node.2. Drag the Capture Workflow End node to this part of the workflow. Add a port to thisnode by clicking on the three dots and selecting this option. Connect the output of theStrings to Document node to its input port. In the node configuration, set "CustomWorkflow Name" as "Data Preparation (Captured Segment)".3. Drag the Workflow Writer node to this part of the workflow. Connect the second outputof the Capture Workflow End node to its input port. In its configuration, set "Write to"option as "Relative to" and 'Current Workflow"; "Folder" as "../Workflow_Segments"; checkoption "Use custom workflow name"; and set "Custom Workflow Name" as"01.Captured_Segment_1__Data_Preparation". Click "OK" and execute this node to createa workflow segment that captures this part of the workflow.4. Connect the first output port of the Capture Workflow End node to the input of thefollowing instance of the Capture Workflow Start node, already present in this workflow. Join the original data with data to which sentiments werepredicted. This join has the goal of guaranteeing that, in the end, westill have all the documents we had in the beginning -- even ifsentiments were not predicted for all of them. Documents with nosentiment words, for example, get filtered out by the predictor, but withthis last join we make sure that they are still part of the output of thisworkflow (and they will have 'neutral' as associated sentiment class). Calculate a Sentiment Scorebased on the Number ofPositive and Negative Wordsand Classify Documentsbased on the Score. Thesentiment score is calculated by(number of postive words -number of negative words)divided by (number of postivewords + number of negativewords). If the score is negative,the document is classified asnegative; if the score is positive,it is classified as positive; and ifit is equal to 0, it is classified asneutral. Here we capture a metanode that tags words basedon their sentiment. Non-tagged words get filtered out inthe end. Scorer. Here we use the "Scorer"node to check how well ourlexicon-based predictions matchthe annotated data. Theperformance is not very goodbecause this approach is a bit toosimplistic. Step 2. Creating a shared component that counts the number of positive andnegative words per document. This component encapsulates the counting of sentimentwords per document (e.g., tweet), separated by class.1. Pressing 'Shift' and using your mouse's left-click button, select nodes starting from Bagof Words Creator all the way to the Missing Value node.2. Right-click and select option "Create Component..." and then name it Numbers ofPositive and Negative Words per Tweet.3. Right-click the newly created component, go to option "Component" and then click option"Share...". Navigate until you find folder "Components" under "Session 1 - Exercises",choosing it as the location for your component. Select option "Include input data withcomponent" and press 'OK'.4. Again, right-click the newly created component, go to option "Component" and then clickoption "Change Link Type...". Select option "Create workflow-relative link" and press 'OK'. By documentsvia sentimentsTF absoluteFiltered wordsConvert strings toto documentsKaggle DatasetN=14640Tweets fromconsumers toairlinesStart capturingword taggingsegment intoa workflowEnd capturingworkflow segmentWrite capturedworkflowCreate id column Start capturingjoiningsegment intoa workflowEnd capturingworkflow segmentWrite capturedworkflowCalculate Scores Tags To String Pivoting TF Bag Of WordsCreator Strings To Document Missing Value DuplicateRow Filter Scorer CSV Reader Column Filter Tag Words asPositive or Negative CaptureWorkflow Start CaptureWorkflow End Workflow Writer Join Sentiment Predictionsand Original Data RowID CaptureWorkflow Start CaptureWorkflow End Workflow Writer Table Validator Session 1 - Preparing to Deploy a WorkflowExercise 01.Capturing_Segments_-_Sentiment_Predictor_-_Lexicon_BasedIn this exercise you'll use integrated deployment to capture segments of a workflow that builds a lexicon-based sentiment analysispredictor. These segments can then be used by a deployable workflow later on. Here we read anannotated twitterdataset containingsentiments of over14K airline reviewsleft by users onTwitter. Step 1. Capturing the data preprocessing segment of the workflow. Here the mostimportant node is Strings to Document, which formats several string columns into asingle document that can be text-mined in KNIME.1. Drag the Capture Workflow Start node to this part of the workflow. Add an input port oftype table by clicking on the three dots and selecting this option. Connect the output of theCSV Reader node to its input port, and connect its output to the input port of the DuplicateRow Filter node.2. Drag the Capture Workflow End node to this part of the workflow. Add a port to thisnode by clicking on the three dots and selecting this option. Connect the output of theStrings to Document node to its input port. In the node configuration, set "CustomWorkflow Name" as "Data Preparation (Captured Segment)".3. Drag the Workflow Writer node to this part of the workflow. Connect the second outputof the Capture Workflow End node to its input port. In its configuration, set "Write to"option as "Relative to" and 'Current Workflow"; "Folder" as "../Workflow_Segments"; checkoption "Use custom workflow name"; and set "Custom Workflow Name" as"01.Captured_Segment_1__Data_Preparation". Click "OK" and execute this node to createa workflow segment that captures this part of the workflow.4. Connect the first output port of the Capture Workflow End node to the input of thefollowing instance of the Capture Workflow Start node, already present in this workflow. Join the original data with data to which sentiments werepredicted. This join has the goal of guaranteeing that, in the end, westill have all the documents we had in the beginning -- even ifsentiments were not predicted for all of them. Documents with nosentiment words, for example, get filtered out by the predictor, but withthis last join we make sure that they are still part of the output of thisworkflow (and they will have 'neutral' as associated sentiment class). Calculate a Sentiment Scorebased on the Number ofPositive and Negative Wordsand Classify Documentsbased on the Score. Thesentiment score is calculated by(number of postive words -number of negative words)divided by (number of postivewords + number of negativewords). If the score is negative,the document is classified asnegative; if the score is positive,it is classified as positive; and ifit is equal to 0, it is classified asneutral. Here we capture a metanode that tags words basedon their sentiment. Non-tagged words get filtered out inthe end. Scorer. Here we use the "Scorer"node to check how well ourlexicon-based predictions matchthe annotated data. Theperformance is not very goodbecause this approach is a bit toosimplistic. Step 2. Creating a shared component that counts the number of positive andnegative words per document. This component encapsulates the counting of sentimentwords per document (e.g., tweet), separated by class.1. Pressing 'Shift' and using your mouse's left-click button, select nodes starting from Bagof Words Creator all the way to the Missing Value node.2. Right-click and select option "Create Component..." and then name it Numbers ofPositive and Negative Words per Tweet.3. Right-click the newly created component, go to option "Component" and then click option"Share...". Navigate until you find folder "Components" under "Session 1 - Exercises",choosing it as the location for your component. Select option "Include input data withcomponent" and press 'OK'.4. Again, right-click the newly created component, go to option "Component" and then clickoption "Change Link Type...". Select option "Create workflow-relative link" and press 'OK'. By documentsvia sentimentsTF absoluteFiltered wordsConvert strings toto documentsKaggle DatasetN=14640Tweets fromconsumers toairlinesStart capturingword taggingsegment intoa workflowEnd capturingworkflow segmentWrite capturedworkflowCreate id column Start capturingjoiningsegment intoa workflowEnd capturingworkflow segmentWrite capturedworkflowCalculate Scores Tags To String Pivoting TF Bag Of WordsCreator Strings To Document Missing Value DuplicateRow Filter Scorer CSV Reader Column Filter Tag Words asPositive or Negative CaptureWorkflow Start CaptureWorkflow End Workflow Writer Join Sentiment Predictionsand Original Data RowID CaptureWorkflow Start CaptureWorkflow End Workflow Writer Table Validator

Nodes

Extensions

Links