This workflows shows how to train a model for named-entity recognition.
The workflow starts with reading the file. In this case each row represents a chapter in Julius Caesars 'De Bello Gallico'. The first step is creating a document column with the 'Strings To Document' node. For clarity of the table, we filter out all columns except the document column. To create (and later validate) a model, we need two data sets. The 'Partitioning' node splits our table into a training and a test set. The training set will now be used to generate the NER model.
For generating a model with the 'StandfordNLP NE Learner' node, a dictionary is needed. For this workflow we used a dictionary with all the names occuring in our training set. So, the model will be build around the training set and the related names. After generating the model, it can be used for the 'StanfordNLP NE Scorer' and the 'StanfordNLP NE Tagger' node.
The Scorer retrieves the test set and the model and validates the model. Internally, the test set will be tagged by each dictionary tagger (with the same dictionary) and Stanford tagger (our generated model). After the tagging process, the Scorer counts the results of both tagging processes and returns measurements like precision, recall, true positives etc..
The 'StandfordNLP NE tagger' is used to tag the documents of the test set. After tagging all terms are filtered out with no PERSON tag assigned and a bag of words is created. To see which new names have been found and recognized by the model the 'Reference Row Filter' is used to exclude all names that are contained in the training dictionary.
To use this workflow in KNIME, download it from the below URL and open it in KNIME:
Download WorkflowDeploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com, follow @NodePit on Twitter or botsin.space/@nodepit on Mastodon.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.