StanfordNLP NE tagger

This Node Is Deprecated — This node is kept for backwards-compatibility, but the usage in new workflows is no longer recommended. The documentation below might contain more information.

This node assigns a named entity tag to each term of a document. It is applicable for English, German and Spanish texts. The built-in tagger models are models created by the Stanford NLP group:
http://nlp.stanford.edu/software/.
You can use the StanfordNLP NE Learner to create your own model based on untagged documents and a dictionary and forward the model to the second input port of this node. If there is no input model, the "use model from input port" option will be deactivated. The other way around, if there is a model at the input port and the optionis activated, the StanfordNLP model selection will be disabled.

Note: The provided tagger models vary in memory consumption and processing speed. Especially the distsim models have an increased runtime, but mostly a better performance as well. There are also models without distributional similarity features. For the usage of these models it is recommended to run KNIME with at least 2GB of heap space. To increase the head space, change the -Xmx setting in the knime.ini file.

Options

Tagger Options

Unmodifiable flag

The unmodifiable flag.

Use model from input port

If checked, the model from second input port will be included.

Combine multi-words

If checked, consecutive words with the same tag will be combined.

Built-in tagger model

Built-in StanfordNLP tagger models. Choose one, if you do not have an external model. For more information visit the StanfordNLP model description):

English all 3 class (distsim): Location, Person, Organization
English conll 4 class (distsim): Location, Person, Organization, Misc
English muc 7 class (distsim): Location, Person, Organization, Money, Percent, Date, Time
English nowiki 3 class caseless (distsim): Location, Person, Organization
English all 3 class (nodistsim): Location, Person, Organization
English conll 4 class (nodistsim): Location, Person, Organization, Misc
English muc 7 class (nodistsim): Location, Person, Organization, Money, Percent, Date, Time
German dewac (created by Faruqui and Pado)
German hgc (created by Faruqui and Pado)
Spanish ancora

General options

Number of maximal parallel tagging processes: Defines the maximal number of parallel threads that are used for tagging. Please note, that for each thread a tagging model will be loaded into memory. If this value is set to a number greater than 1, make sure that enough heap space is available, in order to be able to load the models. If you are not sure how much heap is available for KNIME, leave the number to 1.
Word tokenizer: Select the tokenizer used for word tokenization. Go to Preferences -> KNIME -> Textprocessing to read the description for each tokenizer.

Input Ports

: The input table containing the documents to tag.
: The input port object containing the StanfordNLP NE model, the used dictionary and the used tag.

Output Ports

: An output table containing the tagged documents.

Popular Predecessors

Popular Successors

Views

This node has no views

Workflows

No workflows found

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension KNIME Textprocessing from the below update site following our NodePit Product and Node Installation Guide:

v5.5

A zipped version of the software site can be downloaded here.

Plugin provider: KNIME AG, Zurich, Switzerland

Plugin version: 5.5.0.v202412191419

On NodePit since: 2025-07-02

Last update: 2025-07-21

Tags: StreamableDeprecated

KNIME versions: Since v3.6

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!