This workflow shows how to import text from a CSV file, convert it to documents, pre-process the documents and show how to visualize a tag cloud based on positive and negative terms.
Topic Detection Analysis - Movie Reviews
Topic detection extracts relevant information elements from unstructured text documents and groups them to define some topics. This workflow illustrates how to perform a topic detection analysis on movie reviews.
Task. Perform topic detection in IMDb reviews.
1 - Data Reading
Read IMDb reviews from a CSV file.
The file is located in TheData/SocialMedia.
2 - Pre-processing
- Classic pre-processing of documents: Punctuation Erasure, Number Filter, N Chars Filter, Stop Word Filter, Case Converter
Double-click the metanode to see the sub-workflow
3 - Topic Detection
Build a list of topics of the pre-processed documents using the Topic-Extractor (Parallel LDA) node. Use four words for each topic and eight topics.
Try this:
1) Go to the configuration window of the Topic Extractor (Parallel LDA)
2) Try to change the number of words and topics you want to detect in the document.
4 - Grouping
The GroupBy node concatenates the keywords for the identified topics.
To use this workflow in KNIME, download it from the below URL and open it in KNIME:
Download WorkflowDeploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!