Icon

14_​NER_​Tagger_​Model_​Training

NER Tagger Model Training

This workflows shows how to train a model for named-entity recognition.

The workflow starts with reading the file. In this case each row represents a chapter in Julius Caesars 'De Bello Gallico'. The first step is creating a document column with the 'Strings To Document' node. For clarity of the table, we filter out all columns except the document column. To create (and later validate) a model, we need two data sets. The 'Partitioning' node splits our table into a training and a test set. The training set will now be used to generate the NER model.

For generating a model with the 'StandfordNLP NE Learner' node, a dictionary is needed. For this workflow we used a dictionary with all the names occuring in our training set. So, the model will be build around the training set and the related names. After generating the model, it can be used for the 'StanfordNLP NE Scorer' and the 'StanfordNLP NE Tagger' node.

The Scorer retrieves the test set and the model and validates the model. Internally, the test set will be tagged by each dictionary tagger (with the same dictionary) and Stanford tagger (our generated model). After the tagging process, the Scorer counts the results of both tagging processes and returns measurements like precision, recall, true positives etc..

The 'StandfordNLP NE tagger' is used to tag the documents of the test set. After tagging all terms are filtered out with no PERSON tag assigned and a bag of words is created. To see which new names have been found and recognized by the model the 'Reference Row Filter' is used to exclude all names that are contained in the training dictionary.

Train the model with a training set of documents and a dictionary with words contained in the training set Tag the test set based on the generated model and list tagged words Find names that have been tagged but are notcontained in the initial dictionary. Evaluate model with scorer This workflows shows how to train a model for named-entity recognition. Julius Caesar:De Bello GallicoKeep documentcolumn onlySplit into training and test setTrain modelGroupby namesDictionary with names to learnCheck for names not containedin the training dictionaryValidate modelKeep only termstagged as PERSONTag test set Table Reader Column Filter Partitioning StanfordNLPNE Learner GroupBy Term To String Table Reader ReferenceRow Filter StanfordNLPNE Scorer Column Filter Tag Filter Strings To Document StanfordNLPNE Tagger Bag Of WordsCreator Train the model with a training set of documents and a dictionary with words contained in the training set Tag the test set based on the generated model and list tagged words Find names that have been tagged but are notcontained in the initial dictionary. Evaluate model with scorer This workflows shows how to train a model for named-entity recognition. Julius Caesar:De Bello GallicoKeep documentcolumn onlySplit into training and test setTrain modelGroupby namesDictionary with names to learnCheck for names not containedin the training dictionaryValidate modelKeep only termstagged as PERSONTag test set Table Reader Column Filter Partitioning StanfordNLPNE Learner GroupBy Term To String Table Reader ReferenceRow Filter StanfordNLPNE Scorer Column Filter Tag Filter Strings To Document StanfordNLPNE Tagger Bag Of WordsCreator

Nodes

Extensions

Links