This component extracts the most relevant English keywords in a corpus (a collection of documents) using three specific techniques:
- Topic Extraction using LDA: this technique collects a set of keywords for each different topic which clusters documents in different groups.
- Term Co-Occurrence: this other technique finds pair of keywords which appear together often in different documents.
- Max(TF-IDF) measure: a ranking which measures the importance of terms throughout the corpus.
This component takes as input a column of Document type (from String to Document node) and it then identifies keywords in the corpus according to the hyper-parameters defined in configuration dialogue. The collected keywords are then provided in three tables at the output, one of each of the three techniques above.
The component by default is applying basic text pre-processing (e.g. stopwords and symbols removal) based on the English language. This pre-processing can be deactivated via the dialogue and performed outside of the component when working with other or multiple languages.
To use this component in KNIME, download it from the below URL and open it in KNIME:
Download ComponentDeploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.