Icon

Topic Modeling - Masterthesis

This workflow demonstrates the LDA part of the topic modeling process for Anil Özer's Master's Thesis. It employs the ABC News Headlines Dataset and presents a comprehensive approach including preprocessing, LDA application, and visualization.

To ensure the randomness of topics and keywords presented in the human evaluation of the topics, the workflow further incorporates randomization nodes that mix and shuffle their order.

Topic Modeling - Master ThesisI started by reading the csv. Then I filter the data, create documents out of it and preprocess it for LDA. For the LDA configuration, I selected 10 topics, 10 keywords per topic, α=0.1 and ß=0.01 with a random seed. Finally, I visualized the topics with Color Manager, Tag Cloud, Bar Chart and Bubble Chart. Allocating 10 topics and 10words for each topicStringsto documentsPOS TaggingABC NewsOnly rows from 2013Assigning colors to topicsCounting documentsby topicExcluding the first 5 rowsKeywords in random orderRandomizing the order of topics0-9 LDA topics10-19 BERTopicsCreating a table for BERTopicsKeywords in random orderNode 785 Topic Extractor(Parallel LDA) Document Creation Enrichment Preprocessing CSV Reader Row Filter Color Manager Tag Cloud GroupBy Sorter Bar Chart Row Filter GroupBy Bar Chart(JFreeChart) Shuffle Random NumbersGenerator Table Creator Shuffle Bubble Chart(JFreeChart) Topic Modeling - Master ThesisI started by reading the csv. Then I filter the data, create documents out of it and preprocess it for LDA. For the LDA configuration, I selected 10 topics, 10 keywords per topic, α=0.1 and ß=0.01 with a random seed. Finally, I visualized the topics with Color Manager, Tag Cloud, Bar Chart and Bubble Chart. Allocating 10 topics and 10words for each topicStringsto documentsPOS TaggingABC NewsOnly rows from 2013Assigning colors to topicsCounting documentsby topicExcluding the first 5 rowsKeywords in random orderRandomizing the order of topics0-9 LDA topics10-19 BERTopicsCreating a table for BERTopicsKeywords in random orderNode 785 Topic Extractor(Parallel LDA) Document Creation Enrichment Preprocessing CSV Reader Row Filter Color Manager Tag Cloud GroupBy Sorter Bar Chart Row Filter GroupBy Bar Chart(JFreeChart) Shuffle Random NumbersGenerator Table Creator Shuffle Bubble Chart(JFreeChart)

Nodes

Extensions

Links