This node assigns to each term of a document a part of speech (POS)
tag. It is applicable for French, English, German, Spanish and Arabic texts. The
underlying tagger models are models of the Stanford NLP group:
For English texts the Penn Treebank tag set is used:
For German texts the STTS tag set is used:
For French texts the French Treebank tag set is used:
For Spanish texts the Ancora Treebank tag set is used:
For Arabic texts a Arabic Penn Treebank tag set is used:
There are also German, Spanish and French models using the Universal Dependencies POS tag set:
Note: the provided tagger models vary in memory consumption and processing speed. Especially the models English bidirectional, WSJ bidirectional, German hgc, and German dewac require a lot of memory. For the usage of these models it is recommended to run KNIME with at least 2GB of heap space. To increase the heap space, change the -Xmx setting in the knime.ini file. If KNIME is running with less than 1.5GB heap space it is recommended to use English left3words, English left3words caseless, or German fast models for tagging of english or german texts.
Descriptions of the models (taken from the website of the Stanford NLP group):
To use following tagger models, the specific language pack has to be installed. (File -> Install KNIME Extensions...)
You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.
A zipped version of the software site can be downloaded here.
Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to firstname.lastname@example.org, follow @NodePit on Twitter, or chat on Gitter!
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.