Dictionary Tagger (Multi Column)

This node recognizes named entities specified in one or more dictionary columns and assigns a specified tag value and type. Optionally, the recognized named entity terms can be set unmodifiable, meaning that the terms are not modified or filtered afterwards by any following preprocessing node. However, succeeding tagging nodes can overwrite tags of an unmodifiable term.

If the same entity is contained in different dictionaries, it will be tagged for every fitting configuration. For example, the document contains the term "London" and "London" is also contained in three different dictionaries, it will be tagged with all three tags that have been set for the specific dictionaries.

The sequence of the tags depends on the order of the dictionaries within the node dialog. The order can be changed by using the up/down arrow buttons.

Note, if there are any multi word entities in your dictionary and there is a succeeding dictionary containing one word of the multi word entity, the single word will be tagged only.

Example:

  • Document: "New York is beautiful."
  • Dictionary 1: "New York"
  • Dictionary 2: "York"

In this case only "York" will be tagged. If there is a third dictionary containing "New York" as well, "New York" will be tagged with the tags set for the first and the third dictionary.
The order of the entities within a dictionary is also important. Equally as the order of the dictionaries, the first entity in the dictionary will be tagged first.

Options

General Options

Document column
The column containing the documents to tag.
Replace column
If checked, the documents of the selected document column will be replaced by the new tagged documents. Otherwise the tagged documents will be appended as new column.
Append column
The name of the new appended column, containing the tagged documents.
Word tokenizer
Select the tokenizer used for word tokenization. Go to Preferences -> KNIME -> Textprocessing to read the description for each tokenizer.
Number of maximal parallel tagging processes
Defines the maximal number of parallel threads that are used for tagging. Please note, that for each thread a tagging model will be loaded into memory. If this value is set to a number greater than 1, make sure that enough heap space is available, in order to be able to load the models. If you are not sure how much heap is available for KNIME, leave the number to 1.

Dictionary Tagger Selection

Column Search
Search a column based on its name.
Set entities unmodifiable
Sets recognized named entity terms unmodifiable.
Case sensitive
If checked, case sensitive named entity recognition will be applied, otherwise not.
Exact match
If checked, terms are tagged as as named entities only if they match exactly with an entity to find. Otherwise terms are tagged if they contain the entity string.
Tag type
Specifies the tag type of which tag values can be chosen.
Tag value
Specifies the tag value to use for tagging recognized named entities.
Arrows
Change the order of the dictionaries used for tagging. The dictionary at the top of the dialog will be used at first.
Remove
Clicking the trash can button removes the dictionary and its configuration.

Input Ports

Icon
The input table containing the documents to tag.
Icon
The input table containing one or multiple dictionary columns.

Output Ports

Icon
An output table containing the tagged documents.

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.