This node analyses documents and extracts relevant keywords
using the graph-based approach described in
"KeyGraph: Automatic Indexing by Co-occurrence Graph based on
Building Connstruction Metaphor" by Yukio Ohsawa.
First, a predetermined amount of terms are selected based on their frequency (high frequency set, HF) and added as the initial nodes of the graph.
The association strength between each of these terms is then calculated using the following scoring method: assoc(term1, term2) = min(occurrence frequency of term1, occurrence frequency of term2) summed for every sentence in the document. The top |HF|-1 associations are inserted into the graph as edges.
If an edge between two terms is the only path that connects them, it is pruned.
The graph's connected subgraphs are then extracted and considered as "concept" clusters. A new batch of terms is added based on their key score, which is the conditional probability that a term will be used if the author has all the concepts (clusters) in mind (P(UNION(w|g)) where t is the term and the union is done over every cluster g of the set of clusters.
Each of these new terms is then linked to every cluster using the strongest scoring edge amongst the possible ones.
Finally, all the terms in the graph are rated based on this formula: score(t) = summation over every edge connecting t and other terms (w), summation over every sentences, min(freq(t), freq(w)).
Setting the console's output level to DEBUG will make this node display the contents of the clusters after the pruning phase. terms.
You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.
A zipped version of the software site can be downloaded here.
Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to firstname.lastname@example.org, follow @NodePit on Twitter, or chat on Gitter!
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.