N-Gram Extractor

Go to Product

This nodes extracts n-grams for a given string. In contrast to the “NGram Creator” node available through the Text Processing plugin, this node works with simple strings and does not require a Document cell type, which makes it easier to use in circumstances which do not require the sophisticated Text Processing infrastructure.

This node uses exactly the same logic which is used by the Palladian Text Classifier nodes.

Options

Text input: Input column which contains text for which to create n-grams.
Drop input column: Enable to exclude the input column in the output table.
Set output column name (*): Override the default name for the appended output column. Leave empty to auto-generate the name based on the feature settings
n-gram type: The type of n-grams to be used. n-grams can be created on character and word level.
Min. n-gram length: The minimum length of n-grams to create (i.e. the number of characters or words, depending on n-gram type).
Max. n-gram length: The maximum length of n-grams to create.
Min. term length: (Only effective for n-gram type “word”) The minimum length of a word n-gram in characters to be considered.
Max. term length: (Only effective for n-gram type “word”) The maximum length of a word n-gram in characters to be considered.
Max. term count: The maximum number of terms to extract from each document (useful to speed up processing of huge documents, or to reduce model size in general).
Case sensitive: Activate to treat text documents case sensitively (can improve accuracy in certain cases, but increases model size).
Border padding: Create padded character n-grams at text document’s beginning and end (e.g. for a document starting with “The”, and n-gram length 3, we additionally create features the “##T”, “#Th”. This setting can improve accuracy when classifying very short phrases, but increases model size).
Create skip-grams: When in word mode and n-grams length >= 3, additionally create skip grams. Skip grams allow to model gaps between word groups by leaving out words inside the n-gram. E.g. for the consecutive 3-gram “the quick brown”, the skip gram is “the brown”.

Language-specific settings

Language: The language to use for the language-specific processing (see below; only in case the n-gram type is “word”)
Remove stopwords: Removes stopwords based on a predefined stopword list for the given language.
Stem: Performs stemming using the Snowball stemmer.

Input Ports

: Input table with a string column for which to extract the n-grams.

Output Ports

: Table with appended list column which contains the n-grams.

Popular Predecessors

Popular Successors

Views

This node has no views

Workflows

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Go to Product

Installation

To use this node in KNIME, install the extension Palladian for KNIME from the below update site following our NodePit Product and Node Installation Guide:

v5.5

A zipped version of the software site can be downloaded here.

Plugin provider: palladian.ws

Plugin version: 3.3.0.202506081733

On NodePit since: 2025-07-02

Last update: 2025-08-13

Tags: Streamable

KNIME versions: Since v3.6

NodePit ExclusiveOnly available on NodePit

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!