Icon

Topic Detection Based on Movie Reviews

This workflow shows how to import text from a CSV file, convert it to documents, pre-process the documents and assign topics based on the LDA (Latent Dirichlet Allocation) algorithm, which performs unsupervised topic detection.

URL: Topic Detection (LDA) https://youtu.be/rNxBpJXe_Qw

Topic Detection Analysis - Movie Reviews

Topic detection extracts relevant information elements from unstructured text documents and groups them to define a number of topics. This workflow illustrates how to perform a topic detection analysis on movie reviews.

Task

Perform a topic detection in IMDb reviews.

Data Reading

Read IMDb reviews from a CSV file.

The file is located in TheData/SocialMedia

Preprocessing

Classic pre-processing of documents: Punctuation Erasure, Number Filter, N Chars Filter, Stop Word Filter, Case Converter

Double click the metanode to see the subworkflow

Topic Detection

Build a list of topics of the pre-processed documents using the Topic-Extractor (Parallel LDA) node. Use 4 words for each topic and 8 topics.

Grouping

The GroupBy node concatenates the keywords for the identified topics.

Try this:

  1. Go to the configuration window of the Topic Extractor (Parallel LDA)

  2. Try to change the number of words and topics that you would like to detect in the document.

Conc terms for topics
GroupBy
Transformation of strings to documents
Document Creation and document pre-processing
Reading TheData/ SocialMedia/ IMDb-sample.csv
CSV Reader
Topic Extractor (Parallel LDA)
"Index" columnas string type
Number to String
4 words for 8 topics
Topic Extractor (Parallel LDA)

Nodes

Extensions

Links