This plugin is currently not available in KNIME v5.5 — instead we’re showing this page for KNIME v4.2. You can use the version menu in the title bar to permanently switch your preferred version. This will also show the link to the update site.

14_NER_Tagger_Model_Training

NER Tagger Model Training

This workflows shows how to train a model for named-entity recognition.

The workflow starts with reading the file. In this case each row represents a chapter in Julius Caesars 'De Bello Gallico'. The first step is creating a document column with the 'Strings To Document' node. For clarity of the table, we filter out all columns except the document column. To create (and later validate) a model, we need two data sets. The 'Partitioning' node splits our table into a training and a test set. The training set will now be used to generate the NER model.

For generating a model with the 'StandfordNLP NE Learner' node, a dictionary is needed. For this workflow we used a dictionary with all the names occuring in our training set. So, the model will be build around the training set and the related names. After generating the model, it can be used for the 'StanfordNLP NE Scorer' and the 'StanfordNLP NE Tagger' node.

The Scorer retrieves the test set and the model and validates the model. Internally, the test set will be tagged by each dictionary tagger (with the same dictionary) and Stanford tagger (our generated model). After the tagging process, the Scorer counts the results of both tagging processes and returns measurements like precision, recall, true positives etc..

The 'StandfordNLP NE tagger' is used to tag the documents of the test set. After tagging all terms are filtered out with no PERSON tag assigned and a bag of words is created. To see which new names have been found and recognized by the model the 'Reference Row Filter' is used to exclude all names that are contained in the training dictionary.

14_​NER_​Tagger_​Model_​Training

Nodes

Extensions

Links

Download

14_NER_Tagger_Model_Training