This node analyses documents and extracts relevant keywords
using cooccurrence statistics as described in
"Keyword extraction from a single document using word co-occurrence
statistical information" by Y.Matsuo and M. Ishizuka.
First, the most frequent terms (see node settings) are extracted and
then clustered together using the pointwise mutual information and
a normalized version of the L1 norm as measures of distance between
their cooccurrence probability distributions.
A term can be considered as member of a cluster if it is similar to
all the terms inside it according to at least one of the similarity
measures. If more than one cluster meets this condition, the
one with the highest average score will be used. If no cluster
is similar, a new one is created.
Once this is done, each term is ranked
in decreasing order of the deviation between their expected cluster
cooccurrence and the actual observed cooccurrence value. The terms
with the highest divergence are returned as keywords.
Setting the console's output level to DEBUG will make this node
display the set of frequent terms, the distance between them during
the clustering phase and the final clusters.
terms.
You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.
To use this node in KNIME, install the extension KNIME Textprocessing from the below update site following our NodePit Product and Node Installation Guide:
A zipped version of the software site can be downloaded here.
Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.