A lemmatizer removes inflections, e.g in case of plurals, pronoun case, and verb endings of a word to revert it back to its base form (a lemma). To use the Lemmatizer node, a POS (Part-of-Speech) tagger, e.g Stanford tagger node, or POS tagger node, has to be applied beforehand, because the lemmatization process relies heavily on the POS tag of each term.
This workflows shows a simple example on how to lemmatize terms in documents using the Stanford Lemmatizer node and also to show what exactly the Lemmatizer does to the input document terms, in comparison to other preprocessing nodes, for example the Snowball Stemmer.
Stemmer and lemmatizer are both commonly used natural language processing techniques in the field of Information Retrieval. Let's look at the example below and assume the result is an index of a search engine. If we now query the word "mouse", only the first document will be returned using the stemmedTerms. However, if we use the lemmatizedTerms the second document will also be returned, because the word "mice" is a plural form of "mouse".
To use this workflow in KNIME, download it from the below URL and open it in KNIME:
Download WorkflowDeploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com, follow @NodePit on Twitter or botsin.space/@nodepit on Mastodon.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.