StanfordNLP NE Learner

The StanfordNLP NE Learner creates a conditional random field model based on documents and a dictionary with entities that occur in the documents. The chosen tag and the used dictionary will be saved internally, so they can be used by the StanfordNLP NE tagger to tag new documents and validate the model. If you want to use the model externally, the model file can be found at your workflow directory:

/%KNIMEWORKSPACE%/%WORKFLOW%/StanfordNLP NE Learner(##)/port_1/object/portobject.zip

You can select the document column and the dictionary column to train your model with. It is possible to use multi-term entities within the dictionary. There is also a tab in the dialog to specify the learner properties. Currently, there are only a few options, since the number of parameters is pretty huge. So please contact us, if there are important/highly used parameters, we should integrate!

NOTE: If you are interested in the StanfordNLP toolkit, please visit http://nlp.stanford.edu/software/. Some of the following property descriptions are taken from the NERFeatureFactory class from StanfordNLP. Please look into it for further information.

Options

Learner options

Document column: The document column to train the model with.
String column: The string column containing the entities to train the model with.
Tag type: The tag type to train the model with. This information will be used, if you forward the model to the Stanford NLP NE tagger.
Tag value: The tag value to train the model with. This information will be used, if you forward the model to the Stanford NLP NE tagger.
Word tokenizer: Select the tokenizer used for word tokenization. Go to Preferences -> KNIME -> Textprocessing to read the description for each tokenizer.

Learner Properties

maxLeft: The maximum context of class features used.
Use Class Feature: Include a feature for the class (as a class marginal). Puts a prior on the classes which is equivalent to how often the feature appeared in the training data.
Use Word: Gives you feature for word.
Use NGrams: Make features from letter n-grams, i.e., substrings of the word.
No Mid NGrams: Do not include character n-gram features for n-grams that contain neither the beginning or end of the word.
Max NGram Length: If this number is positive, n-grams above this size will not be used in the model.
Use Prev: Enables previous features.
Use Next: Enables next features.
Use Disjunctive: Include in features giving disjunctions of words anywhere in the left or right disjunctionWidth words (preserving direction but not position).
Use Sequences: Does not use any class combination features if this is false.
Use Prev Sequences: Does not use any class combination features using previous classes if this is false.
Use Type Seqs: Use basic zeroeth order word shape features.
Use Type Seqs2: Add additional first and second order word shape features
Use Type YSeqs: Some first order word shape patterns.
Word Shape: Either "none" for no wordShape use, or the name of a word shape function
Case Sensitivity: Select to handle the words from the dictionary in a case sensitive manner.

Input Ports

: The input table containing the documents to train the model with.
: The input dictionary containing known single- and/or multi-term entities to train the model.

Output Ports

: The StanfordNLP NE model.

Popular Predecessors

Popular Successors

Views

This node has no views

Workflows

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension KNIME Textprocessing from the below update site following our NodePit Product and Node Installation Guide:

v5.5

A zipped version of the software site can be downloaded here.

Plugin provider: KNIME AG, Zurich, Switzerland

Plugin version: 5.5.0.v202412191419

On NodePit since: 2025-07-02

Last update: 2025-08-13

KNIME versions: Since v3.6

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!