0 ×

N-Gram Extractor

StreamablePalladian for KNIME version by palladian.ws; Philipp Katz, Klemens Muthmann, David Urbansky

This nodes extracts n-grams for a given string. In contrast to the “NGram Creator” node available through the Text Processing plugin, this node works with simple strings and does not require a Document cell type, which makes it easier to use in circumstances which do not require the sophisticated Text Processing infrastructure.

This node uses exactly the same logic which is used by the Palladian Text Classifier nodes.


Text input
Input column which contains text for which to create n-grams.
n-gram type
The type of n-grams to be used. n-grams can be created on character and word level.
Min. n-gram length
The minimum length of n-grams to create (i.e. the number of characters or words, depending on n-gram type).
Max. n-gram length
The maximum length of n-grams to create.
Min. term length
(Only effective for n-gram type “word”) The minimum length of a word n-gram in characters to be considered.
Max. term length
(Only effective for n-gram type “word”) The maximum length of a word n-gram in characters to be considered.
Max. term count
The maximum number of terms to extract from each document (useful to speed up processing of huge documents, or to reduce model size in general).
Case sensitive
Activate to treat text documents case sensitively (can improve accuracy in certain cases, but increases model size).
Border padding
Create padded character n-grams at text document’s beginning and end (e.g. for a document starting with “The”, and n-gram length 3, we additionally create features the “##T”, “#Th”. This setting can improve accuracy when classifying very short phrases, but increases model size).
Create skip-grams
When in word mode and n-grams length >= 3, additionally create skip grams. Skip grams allow to model gaps between word groups by leaving out words inside the n-gram. E.g. for the consecutive 3-gram “the quick brown”, the skip gram is “the brown”.

Language-specific settings

The language to use for the language-specific processing (see below; only in case the n-gram type is “word”)
Remove stopwords
Removes stopwords based on a predefined stopword list for the given language.
Performs stemming using the Snowball stemmer.

Input Ports

Input table with a string column for which to extract the n-grams.

Output Ports

Table with appended list column which contains the n-grams.

Best Friends (Incoming)

Best Friends (Outgoing)


To use this node in KNIME, install Palladian for KNIME from the following update site:


A zipped version of the software site can be downloaded here. Read our FAQs to get instructions about how to install nodes from a zipped update site.

Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform.


You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.