StanfordNLP NE tagger

This Node Is Deprecated — This node is kept for backwards-compatibility, but the usage in new workflows is no longer recommended. The documentation below might contain more information.

This node assigns a named entity tag to each term of a document. It is applicable for English, German and Spanish texts. The built-in tagger models are models created by the Stanford NLP group:
http://nlp.stanford.edu/software/.
You can use the StanfordNLP NE Learner to create your own model based on untagged documents and a dictionary and forward the model to the second input port of this node. If there is no input model, the "use model from input port" option will be deactivated. The other way around, if there is a model at the input port and the optionis activated, the StanfordNLP model selection will be disabled.

Note: The provided tagger models vary in memory consumption and processing speed. Especially the distsim models have an increased runtime, but mostly a better performance as well. There are also models without distributional similarity features. For the usage of these models it is recommended to run KNIME with at least 2GB of heap space. To increase the head space, change the -Xmx setting in the knime.ini file.

Options

Tagger Options

Unmodifiable flag
The unmodifiable flag.
Use model from input port
If checked, the model from second input port will be included.
Combine multi-words
If checked, consecutive words with the same tag will be combined.
Built-in tagger model
Built-in StanfordNLP tagger models. Choose one, if you do not have an external model. For more information visit the StanfordNLP model description):
  • English all 3 class (distsim): Location, Person, Organization
  • English conll 4 class (distsim): Location, Person, Organization, Misc
  • English muc 7 class (distsim): Location, Person, Organization, Money, Percent, Date, Time
  • English nowiki 3 class caseless (distsim): Location, Person, Organization
  • English all 3 class (nodistsim): Location, Person, Organization
  • English conll 4 class (nodistsim): Location, Person, Organization, Misc
  • English muc 7 class (nodistsim): Location, Person, Organization, Money, Percent, Date, Time
  • German dewac (created by Faruqui and Pado)
  • German hgc (created by Faruqui and Pado)
  • Spanish ancora

General options

Number of maximal parallel tagging processes
Defines the maximal number of parallel threads that are used for tagging. Please note, that for each thread a tagging model will be loaded into memory. If this value is set to a number greater than 1, make sure that enough heap space is available, in order to be able to load the models. If you are not sure how much heap is available for KNIME, leave the number to 1.
Word tokenizer
Select the tokenizer used for word tokenization. Go to Preferences -> KNIME -> Textprocessing to read the description for each tokenizer.

Input Ports

Icon
The input table containing the documents to tag.
Icon
The input port object containing the StanfordNLP NE model, the used dictionary and the used tag.

Output Ports

Icon
An output table containing the tagged documents.

Views

This node has no views

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.