This node analyses documents and extracts relevant keywords
using the graph-based approach described in
"KeyGraph: Automatic Indexing by Co-occurrence Graph based on
Building Connstruction Metaphor" by Yukio Ohsawa.
First, a predetermined amount of terms are selected based on their
frequency (high frequency set, HF) and added as the initial nodes of
the graph.
The association strength between each of these terms is then
calculated using the following scoring method: assoc(term1, term2) =
min(occurrence frequency of term1, occurrence frequency of term2)
summed for every sentence in the document.
The top |HF|-1 associations are inserted into the graph as edges.
If an edge between two terms is the only path that connects them, it
is pruned.
The graph's connected subgraphs are then extracted and considered as
"concept" clusters.
A new batch of terms is added based on their key score, which is the
conditional probability that a term will be used if the author has
all the concepts (clusters) in mind (P(UNION(w|g)) where t is the
term and the union is done over every cluster g of the set of clusters.
Each of these new terms is then linked to every cluster using the
strongest scoring edge amongst the possible ones.
Finally, all the terms in the graph are rated based on this formula:
score(t) = summation over every edge connecting t and other terms (w),
summation over every sentences, min(freq(t), freq(w)).
Setting the console's output level to DEBUG will make this node
display the contents of the clusters after the pruning phase.
terms.
You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.
To use this node in KNIME, install the extension KNIME Textprocessing from the below update site following our NodePit Product and Node Installation Guide:
A zipped version of the software site can be downloaded here.
Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.