Topic Modeling - Optimizing Alpha and Beta

This workflow demonstrates how to compute the Chi-Square Statistics to address the goodness of fit in topic modeling. It tests the multinomial assumptions behind the LDA model and examines whether the observed and estimated word vectors are statistically indistinguishable.

In this workflow, the objective functions we aim to optimize are alpha and beta.

Lewis, C. M., & Grossetti, F. "A Statistical Approach for Optimal Topic Model Identification" (Journal of Machine Learning, 23(58), 1−20, 2022).

URL: A Statistical Approach for Optimal Topic Model Identification https://www.jmlr.org/papers/volume23/19-297/19-297.pdf
URL: Say hi to Chi-square (χ²) for Optimal Topic Modeling https://medium.com/low-code-for-advanced-data-science/say-hi-to-chi-square-%CF%87%C2%B2-for-optimal-topic-modeling-1cf1d55dc7fa

Nodes

Math Formula30 ×
Column Filter28 ×
Column Renamer25 ×
RowID24 ×
GroupBy19 ×
Transpose12 ×
Table Creator11 ×
Loop End10 ×
Pivoting10 ×
String Manipulation10 ×
Table Row to Variable10 ×
Missing Value8 ×
Sorter8 ×
Value Lookup8 ×
Term To String7 ×
Column Appender6 ×
Concatenate6 ×
Empty Table Switch6 ×
End IF6 ×
Table Row to Variable Loop Start6 ×
Extract Column Header4 ×
Column Resorter4 ×
Constant Value Column4 ×
Joiner4 ×
Moving Aggregation4 ×
Rule-based Row Splitter4 ×
Bag Of Words Creator3 ×
Case Converter3 ×
Component Input3 ×
Component Output3 ×
Cross Joiner3 ×
Number Filter3 ×
Breakpoint2 ×
CSV Reader2 ×
Cell Splitter2 ×
Chunk Loop Start2 ×
Column Selection Configuration2 ×
Counter Generation2 ×
DF2 ×
Dictionary Filter2 ×
Dictionary Tagger2 ×
Group Loop Start2 ×
IDF2 ×
N Chars Filter2 ×
Punctuation Erasure2 ×
Python View2 ×
Reference Column Filter2 ×
Reference Row Filter2 ×
Rule-based Row Filter2 ×
Stop Word Filter2 ×
Strings To Document2 ×
TF2 ×
Tag Filter2 ×
Topic Extractor (Parallel LDA)2 ×
Document Data Extractor1 ×
Kuhlen Stemmer1 ×
Markup Tag Filter1 ×
Row Sampling1 ×
Row Splitter1 ×
Stanford Lemmatizer1 ×
Stanford Tagger1 ×

Extensions

FeatureKNIME Base nodes
FeatureKNIME Data Generation
FeatureKNIME Javasnippet
FeatureKNIME Math Expression (JEP)
FeatureKNIME Python Integration
Show all 8 modules

Topic Modeling - Optimizing Alpha and Beta

Nodes

Extensions

Links

Download