Icon

Topic Models from reviews

Topic Models from Reviews

This workflow addresses the problem of extracting and modeling topics from reviews.

Block 1 performs the data preparation on review texts. Block 2 optimizes the parameters for the LDA algorithm. Block 3 applies the LDA algorithm with optimized parameters and displays the LDA topic probabilities along with the average number of stars by topic. Block 4 estimates the importance of topics via linear regression (KNIME) and polynomial regression (R).

If you use this workflow, please cite:
F. Villaroel Ordenes & R. Silipo, “Machine learning for marketing on the KNIME Hub: The development of a live repository for marketing applications”, Journal of Business Research 137(1):393-410, DOI: 10.1016/j.jbusres.2021.08.036.

Block 2 - Find optimal k for topic models: This block finds the optimal k for the LDA topicmodeling algorithm. Other methods for topic extraction and modeling can be implemented inKNIME (see https://hub.knime.com/angusveitch/spaces/Public/latest/TopicKR~HRMp6v9Ip_ODMIob). Other topic modeling algorithms that can be used in R orPython are structural topic models (STM) and correlated topic models (CTM). Block 4 - Analysis to inspect the impact of topics on customer star rating: Analysis can beimproved by including topic sentiment, interaction terms, and different modeling alternatives(e.g., ordinal logit regression in R). Block 1 - Data Preparation for Topic Models: This block performs preprocessing,extraction of n-grams, and exclusion of reviews with a small number of terms. It can beadjusted as desired Block 3 - Obtain topic solution: Users can test more than one solution for topicextraction and choose the best one based on interpretability. The "topic analysiscomponent" needs to be manually edited to rename topics if changes are made at anyearlier stage of the process. Analysis of customer experience feedback with topic models The study of customer experience management (CXM) with big data analytics (BDA) is one of the most relevant marketing analytics topics in the last years. The present workflow showshow managers can identify service aspects with a greater impact on customer overall evaluation (star rating). The workflow shows as well how to integrate R for statistical analysis within KNIME Step 3: Select a topic Topic ComparisonPer HotelAssessmodel fit (perplexity), in a wide range of topics (2 to 80).Identify elbow range in imageOrdinal Logitregression(MASS package)Summary words per topicbi-grams(tri-grams canbe added as well)Narrow down the searchon a smaller range from step 1.Identify elbow point (15)- DV: Star Rating- IV's Topic Probabilities baseline: Topic 0- Control: HotelStar_RatingTripAdvisor 2 hotelsavg # stars by topicper hotelTopic Extractor(Parallel LDA) Preprocessing Topic Analysis Step 1: Optimalk in [2,80] Doc Creation Table to R GroupBy N-grams Filter Reviews withLess than 10 words Step 2: Optimalk in [2,20] Linear RegressionLearner Number To String Excel Reader VisualizePerplexity VisualizePerplexity Topic Analysis Block 2 - Find optimal k for topic models: This block finds the optimal k for the LDA topicmodeling algorithm. Other methods for topic extraction and modeling can be implemented inKNIME (see https://hub.knime.com/angusveitch/spaces/Public/latest/TopicKR~HRMp6v9Ip_ODMIob). Other topic modeling algorithms that can be used in R orPython are structural topic models (STM) and correlated topic models (CTM). Block 4 - Analysis to inspect the impact of topics on customer star rating: Analysis can beimproved by including topic sentiment, interaction terms, and different modeling alternatives(e.g., ordinal logit regression in R). Block 1 - Data Preparation for Topic Models: This block performs preprocessing,extraction of n-grams, and exclusion of reviews with a small number of terms. It can beadjusted as desired Block 3 - Obtain topic solution: Users can test more than one solution for topicextraction and choose the best one based on interpretability. The "topic analysiscomponent" needs to be manually edited to rename topics if changes are made at anyearlier stage of the process. Analysis of customer experience feedback with topic models The study of customer experience management (CXM) with big data analytics (BDA) is one of the most relevant marketing analytics topics in the last years. The present workflow showshow managers can identify service aspects with a greater impact on customer overall evaluation (star rating). The workflow shows as well how to integrate R for statistical analysis within KNIME Step 3: Select a topic Topic ComparisonPer HotelAssessmodel fit (perplexity), in a wide range of topics (2 to 80).Identify elbow range in imageOrdinal Logitregression(MASS package)Summary words per topicbi-grams(tri-grams canbe added as well)Narrow down the searchon a smaller range from step 1.Identify elbow point (15)- DV: Star Rating- IV's Topic Probabilities baseline: Topic 0- Control: HotelStar_RatingTripAdvisor 2 hotelsavg # stars by topicper hotelTopic Extractor(Parallel LDA) Preprocessing Topic Analysis Step 1: Optimalk in [2,80] Doc Creation Table to R GroupBy N-grams Filter Reviews withLess than 10 words Step 2: Optimalk in [2,20] Linear RegressionLearner Number To String Excel Reader VisualizePerplexity VisualizePerplexity Topic Analysis

Nodes

Extensions

Links