0 ×

StanfordNLP NE tagger

StreamableDeprecatedKNIME Textprocessing Plug-in version 4.3.0.v202011212014 by KNIME AG, Zurich, Switzerland

This node assigns a named entity tag to each term of a document. It is applicable for English, German and Spanish texts. The built-in tagger models are models created by the Stanford NLP group:
http://nlp.stanford.edu/software/.
You can use the StanfordNLP NE Learner to create your own model based on untagged documents and a dictionary and forward the model to the second input port of this node. If there is no input model, the "use model from input port" option will be deactivated. The other way around, if there is a model at the input port and the optionis activated, the StanfordNLP model selection will be disabled.

Note: The provided tagger models vary in memory consumption and processing speed. Especially the distsim models have an increased runtime, but mostly a better performance as well. There are also models without distributional similarity features. For the usage of these models it is recommended to run KNIME with at least 2GB of heap space. To increase the head space, change the -Xmx setting in the knime.ini file.

Options

Tagger Options

Unmodifiable flag
The unmodifiable flag.
Use model from input port
If checked, the model from second input port will be included.
Combine multi-words
If checked, consecutive words with the same tag will be combined.
Built-in tagger model
Built-in StanfordNLP tagger models. Choose one, if you do not have an external model. For more information visit the StanfordNLP model description):
  • English all 3 class (distsim): Location, Person, Organization
  • English conll 4 class (distsim): Location, Person, Organization, Misc
  • English muc 7 class (distsim): Location, Person, Organization, Money, Percent, Date, Time
  • English nowiki 3 class caseless (distsim): Location, Person, Organization
  • English all 3 class (nodistsim): Location, Person, Organization
  • English conll 4 class (nodistsim): Location, Person, Organization, Misc
  • English muc 7 class (nodistsim): Location, Person, Organization, Money, Percent, Date, Time
  • German dewac (created by Faruqui and Pado)
  • German hgc (created by Faruqui and Pado)
  • Spanish ancora

General options

Number of maximal parallel tagging processes
Defines the maximal number of parallel threads that are used for tagging. Please note, that for each thread a tagging model will be loaded into memory. If this value is set to a number greater than 1, make sure that enough heap space is available, in order to be able to load the models. If you are not sure how much heap is available for KNIME, leave the number to 1.
Word tokenizer
Select the tokenizer used for word tokenization. Go to Preferences -> KNIME -> Textprocessing to read the description for each tokenizer.

Input Ports

Icon
The input table containing the documents to tag.
Icon
The input port object containing the StanfordNLP NE model, the used dictionary and the used tag.

Output Ports

Icon
An output table containing the tagged documents.

Best Friends (Incoming)

Best Friends (Outgoing)

Installation

To use this node in KNIME, install KNIME Textprocessing from the following update site:

KNIME 4.3

A zipped version of the software site can be downloaded here.

You don't know what to do with this link? Read our NodePit Product and Node Installation Guide that explains you in detail how to install nodes to your KNIME Analytics Platform.

Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform. Browse NodePit from within KNIME, install nodes with just one click and share your workflows with NodePit Space.

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.