This component can compute different metrics of topics created by the Topic Extractor (Parallel LDA) node and Topic Extractor (STM) component. We list below the metrics it can score provided a table or pre-processed documents and a table of weighted terms for each topic. Provide the topics of a single model or of multiple models.
Take a look at the example workflows at the bottom of this page to learn how to concatenate topics from different models trained on the same corpus of documents or add a ‘model ID’ to the output of the Topic Extractor (Parallel LDA) node.
DISCLAIMER: this verified component is currently marked as part of KNIME Labs (knime.com/knime-labs). Provide feedback at upskilling@knime.com
Topic Semantic Coherence score:
This component calculates semantic coherence scores for each topic. Semantic coherence measures how coherent topics are by checking if the topics top terms appear together in the same documents more often than not. This experimental implementation is based on the paper by Mimno et al (2011) [dl.acm.org/doi/10.5555/2145432.2145462].
Topic Exclusivity score:
This component calculates the exclusivity of topics. Exclusivity is computed using an experimental implementation of the FREX function by Bischof and Airoldi (2012) [dl.acm.org/doi/10.5555/3042573.3042578]. FREX does not take in consideration only how exclusive/unique terms are between different topics (top terms table), but also how rare those topics are in documents of the same topic (documents table).
When comparing multiple models, documents can be assigned by different models to different topics and therefore exclusivity can be computed only using how unique terms are in the topics top terms table. Read more in the setting “Ignore Assigned Topic Column” description.
Topic Neighbor Distance score:
This component computes an experimental distance between topics within the same model or between several models. To do this, topics are represented by a normalized vector by pivoting the top terms by topic table. A cosine distance between topic vectors is computed. For each topic the distance is used to show the closest and farthest topic within one or between more models.
To use this component in KNIME, download it from the below URL and open it in KNIME:
Download ComponentDeploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.