This component serves the purpose of visually representing and analyzing the outcomes of a topic model. It is compatible with any topic modeling model as long as they generate the topic-term matrix and the topic-document matrix. We recommend using this component downstream from the Topic Extractor (Parallel LDA) node [kni.me/n/w7Vr1wY8Bu8Gfpv7] or the Topic Extractor (STM) component [kni.me/c/DFANPa0NHnZb9tSV]. For more details see port documentation below.
The component interactive view proves valuable in validating a chosen topic model solution and offering insights into the similarity between different extracted topics.
The Topic Explorer View offers two modes:
- Explore by Topic: explore the topics (second input) in a similarity bubble chart, select topics and visualize coherence and exclusivity scores from the Topic Scorer component (kni.me/c/5_W2h2g6hBY_M0Bc) and the associated tag cloud. Additionally you can scroll through topics represented as small bar charts.
- Explore by Document: explore the documents (first input) in a similarity bubble chart, select topics and visualize the preview or the full length of documents where the terms inside the topics are highlighted.
Both modes provide a similarity bubble chart, where topics or documents with higher semantic similarity are positioned closer to each other on the graph in 2-dimensional space. This is achieved through a combination of distinct analytics techniques:
1) For the “Explore by Topic” mode, we utilize a Word2Vec model (kni.me/n/QPMbC4vyfvPkfV8F) to calculate the distances between all words within the documents. These distances are then used to construct a distance matrix, representing the similarity among all topics by averaging the distances of the words associated with each specific topic.
2) The distance matrix generated by Word2Vec is further processed using Multidimensional Scaling (MDS) (kni.me/n/SCgPuzvfM-9t325D), which decomposes it into two dimensions. These two dimensions serve as the coordinates of each topic in a 2-dimensional space. Additionally, the size of the points representing topics directly corresponds to their frequency among the documents.
3) The size of the bubble represents the mean probability of input documents to belong to that topic.
4) When adopting the “Explore by Document” mode, each bubble represent a different document as we perform a similar approach using the documents bag of words instead of the topic models output terms
DISCLAIMER: When dealing with a large number of documents this data app slows down in performance. By default the top 250 rows from the top input and the top 10 terms per topic from the second input are considered. You can increase these numbers in the component dialogue. To not face performance issues, it is advisable to employ stratified sampling on the first input using the assigned topic column in a Row Sampling node (kni.me/n/3o-UY2qMENf5piCd) before the component.
This component can be utilized as a data app, running either on a local environment or on KNIME Server and KNIME Business Hub.
To use this component in KNIME, download it from the below URL and open it in KNIME:
Download ComponentDeploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.