Unique Term Extractor

This node creates a global set of terms over all documents. Optionally, it is possible to filter the top-k words in terms of frequencies. There are three different frequencies to choose from for filtering: the term frequency, the document frequency and the inverse document frequency.

Term Frequency (TF): Overall count of a term in all documents.
Document Frequency (DF): Number of documents in which a term occurs.
Inverse Document Frequency (IDF): The logarithm of the total number of documents divided by the DF.

More information about term frequencies can be found here.

Options

Document column: Select the document column to extract the terms from.
Most frequent terms (k): Check, if the data table should be restricted on the top k most frequent terms.
Filter terms by: If the 'Most frequent terms (k)' option is checked, the terms are sorted by the selected frequency method (TF, DF or IDF). Only the top-k most frequent terms are then added to the data table.
Append index column: If checked, the node appends an index column containing a unique index for each term. This is especially useful for replacing words with numbers while preparing documents for deep learning.
Append frequency columns: If checked, the node appends a term frequency (TF), document frequency (DF) and inverse document frequency (IDF) column.
Number of threads: The number of threads used to process the documents.

Input Ports

: The input table containing the documents.

Output Ports

: An output table containing a unique term column, frequency columns and an index column.

Popular Predecessors

Popular Successors

Views

This node has no views

Workflows

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension KNIME Textprocessing from the below update site following our NodePit Product and Node Installation Guide:

v5.5

A zipped version of the software site can be downloaded here.

Plugin provider: KNIME AG, Zurich, Switzerland

Plugin version: 5.5.0.v202412191419

On NodePit since: 2025-07-02

Last update: 2025-07-21

KNIME versions: Since v4.0

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!