Icon

Topic Models from Reviews

This workflow addresses the problem of extracting and modeling topics from reviews.

Block 1 performs the data preparation on review texts. Block 2 optimizes the parameters for the LDA algorithm. Block 3 applies the LDA algorithm with optimized parameters and displays the LDA topic probabilities along with the average number of stars by topic. Block 4 estimates the importance of topics via linear regression (KNIME) and polynomial regression (R).

If you use this workflow, please cite:
F. Villaroel Ordenes & R. Silipo, “Machine learning for marketing on the KNIME Hub: The development of a live repository for marketing applications”, Journal of Business Research 137(1):393-410, DOI: 10.1016/j.jbusres.2021.08.036.

URL: 10.1016/j.jbusres.2021.08.036 http://10.1016/j.jbusres.2021.08.036

Block 3 - Obtain topic solution:

Users can test more than one solution for topic extraction and choose the best one based on interpretability. The "Topic Analysis" component needs to be manually edited to rename topics if changes are made at any earlier stage of the process.

Summary words per topic
GroupBy
TripAdvisor 2 hotels
Excel Reader
Step 3: Select a topic
Topic Extractor (Parallel LDA)
Doc Creation
Topic Comparison Per Hotel
Topic View
Bubble Chart UMAP
Preprocessing
Hierarchical View
bi-grams (tri-grams can be added as well)
N-grams
Topic Extractor (BERTopic)
avg # stars by topic per hotel
Topic Analysis
Summary words per topic
GroupBy
Filter Reviews with Less than 10 words

Nodes

Extensions

Links