Text Classifier Learner

Go to Product

This node builds a dictionary from a pre-categorized list of text documents which can then be used to categorize new, uncategorized text documents. This learner builds a weighted term look up table, to learn how probable each n-gram is for a given category. This look up table is used by the corresponding predictor node.

This classifier won the first Research Garden competition where the goal was to classify product descriptions into eight different categories. See press release (on archive.org).

Feature Settings

Features are the input for a classifier. In text classification, we have a long string as an input from which we need to derive features during preprocessing. Palladian’s text classifier works with n-grams. n-grams are sets of tokens of the length n, which are created by sliding a “window” over the given text. The PalladianTextClassifierLearner node can create features using character- or word-based n-grams. As an example, consider the text “the quick brown fox”:

The set of word-based 2-grams would contain the following entries: {“the quick”, “quick brown”, “brown fox”}.
The set of character-5-grams consists of the following entries: {“the q”, “he qu”, “e qui”, “ quic”, “quick”, …}.
It is possible to combine n-grams of different lengths. For example, the set of character-4-6-grams contains the union of the sets of 4-, 5-, and 6-grams.

Options

Text input: Column in the input table with the text documents.
Category input: Column in the input table with the pre-assigned categories.
Weight input: (optional) column in the input table which allows weighting differing training documents (a value of 1 means normal weight, for higher values, the learner behaves like adding the particular document n times).
n-gram type: The type of n-grams to be used. n-grams can be created on character and word level.
Min. n-gram length: The minimum length of n-grams to create (i.e. the number of characters or words, depending on n-gram type).
Max. n-gram length: The maximum length of n-grams to create.
Min. term length: (Only effective for n-gram type “word”) The minimum length of a word n-gram in characters to be considered.
Max. term length: (Only effective for n-gram type “word”) The maximum length of a word n-gram in characters to be considered.
Max. term count: The maximum number of terms to extract from each document (useful to speed up processing of huge documents, or to reduce model size in general).
Case sensitive: Activate to treat text documents case sensitively (can improve accuracy in certain cases, but increases model size).
Border padding: Create padded character n-grams at text document’s beginning and end (e.g. for a document starting with “The”, and n-gram length 3, we additionally create features the “##T”, “#Th”. This setting can improve accuracy when classifying very short phrases, but increases model size).
Create skip-grams: When in word mode and n-grams length >= 3, additionally create skip grams. Skip grams allow to model gaps between word groups by leaving out words inside the n-gram. E.g. for the consecutive 3-gram “the quick brown”, the skip gram is “the brown”.

Language-specific settings

Language: The language to use for the language-specific processing (see below; only in case the n-gram type is “word”)
Remove stopwords: Removes stopwords based on a predefined stopword list for the given language.
Stem: Performs stemming using the Snowball stemmer.

Expert settings

Do not stop on memory warnings: Do not listen to KNIME’s memory warnings. Attention: In case this option is enabled, KNIME will become unresponsive when the model size exceeds the memory limit.

Input Ports

: Input with pre-categorized text documents. The category has to be given by a separate String column.

Output Ports

: The model data of the trained classifier.

Popular Predecessors

Popular Successors

Views

This node has no views

Workflows

No workflows found

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Go to Product

Installation

To use this node in KNIME, install the extension Palladian for KNIME from the below update site following our NodePit Product and Node Installation Guide:

v5.6

A zipped version of the software site can be downloaded here.

Plugin provider: palladian.ws

Plugin version: 3.3.0.202506081733

On NodePit since: 2025-08-15

Last update: 2025-08-21

Tags: Streamable

KNIME versions: Since v3.6

NodePit ExclusiveOnly available on NodePit

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!