StanfordNLP NE Tagger

This node assigns a named entity tag to each term of a document. It is applicable for English, German and Spanish texts. The built-in tagger models are models created by the Stanford NLP group:
http://nlp.stanford.edu/software/.
You can use the StanfordNLP NE Learner to create your own model based on untagged documents and a dictionary and forward the model to the second input port of this node. If there is no input model, the "use model from input port" option will be deactivated. The other way around, if there is a model at the input port and the optionis activated, the StanfordNLP model selection will be disabled.

Note: The provided tagger models vary in memory consumption and processing speed. Especially the distsim models have an increased runtime, but mostly a better performance as well. There are also models without distributional similarity features. For the usage of these models it is recommended to run KNIME with at least 2GB of heap space. To increase the head space, change the -Xmx setting in the knime.ini file.

Options

General options

Document column
The column containing the documents to tag.
Replace column
If checked, the documents of the selected document column will be replaced by the new tagged documents. Otherwise the tagged documents will be appended as new column.
Append column
The name of the new appended column, containing the tagged documents.
Word tokenizer
Select the tokenizer used for word tokenization. Go to Preferences -> KNIME -> Textprocessing to read the description for each tokenizer.
Number of maximal parallel tagging processes
Defines the maximal number of parallel threads that are used for tagging. Please note, that for each thread a tagging model will be loaded into memory. If this value is set to a number greater than 1, make sure that enough heap space is available, in order to be able to load the models. If you are not sure how much heap is available for KNIME, leave the number to 1.

Tagger Options

Unmodifiable flag
The unmodifiable flag.
Use model from input port
If checked, the model from second input port will be included.
Combine multi-words
If checked, consecutive words with the same tag will be combined.
Built-in tagger model
Built-in StanfordNLP tagger models. Choose one, if you do not have an external model. For more information visit the StanfordNLP model description):
  • English all 3 class (distsim): Location, Person, Organization
  • English conll 4 class (distsim): Location, Person, Organization, Misc
  • English muc 7 class (distsim): Location, Person, Organization, Money, Percent, Date, Time
  • English nowiki 3 class caseless (distsim): Location, Person, Organization
  • English nowiki 3 class (no distsim): Location, Person, Organization
  • English all 3 class (no distsim): Location, Person, Organization
  • English conll 4 class (no distsim): Location, Person, Organization, Misc
  • English muc 7 class (no distsim): Location, Person, Organization, Money, Percent, Date, Time
  • English all 3 class caseless (distsim): Location, Person, Organization
  • English conll 4 class caseless (distsim): Location, Person, Organization, Misc
  • English muc 7 class caseless (distsim): Location, Person, Organization, Money, Percent, Date, Time
To use following tagger models, the specific language pack has to be installed. (File -> Install KNIME Extensions...)
  • German Conll GermEval 2014 hgc (created by Faruqui and Pado)
  • German dewac (created by Faruqui and Pado)
  • German hgc (created by Faruqui and Pado)
  • Spanish ancora
  • Spanish KBP ancora

Input Ports

Icon
The input table containing the documents to tag.
Icon
The input port object containing the StanfordNLP NE model, the used dictionary and the used tag.

Output Ports

Icon
An output table containing the tagged documents.

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.