Icon

11_​Lemmatizer_​Preprocessing

Stanford Lemmatizer Example

A lemmatizer removes inflections, e.g in case of plurals, pronoun case, and verb endings of a word to revert it back to its base form (a lemma). To use the Lemmatizer node, a POS (Part-of-Speech) tagger, e.g Stanford tagger node, or POS tagger node, has to be applied beforehand, because the lemmatization process relies heavily on the POS tag of each term.

This workflows shows a simple example on how to lemmatize terms in documents using the Stanford Lemmatizer node and also to show what exactly the Lemmatizer does to the input document terms, in comparison to other preprocessing nodes, for example the Snowball Stemmer.

Stemmer and lemmatizer are both commonly used natural language processing techniques in the field of Information Retrieval. Let's look at the example below and assume the result is an index of a search engine. If we now query the word "mouse", only the first document will be returned using the stemmedTerms. However, if we use the lemmatizedTerms the second document will also be returned, because the word "mice" is a plural form of "mouse".

This workflows shows the difference between lemmatization, using the Stanford Lemmatizer node, and stemming, using the snowballstemmer node. Compare the original terms, the stemmed terms, and thelemmatized terms CreateexamplesentencesApplylemmatizerApplystemmerJoin different resultsinto one tableConvert todocumentsApplyPart-of-SpeechtagsCreate bag of words ofthe lemmatized termsCreate bag of words ofthe stemmed termsCreate bag of words ofthe original terms Table Creator Stanford Lemmatizer Snowball Stemmer Term Processing Strings To Document Stanford Tagger Bag Of WordsCreator Bag Of WordsCreator Bag Of WordsCreator This workflows shows the difference between lemmatization, using the Stanford Lemmatizer node, and stemming, using the snowballstemmer node. Compare the original terms, the stemmed terms, and thelemmatized terms CreateexamplesentencesApplylemmatizerApplystemmerJoin different resultsinto one tableConvert todocumentsApplyPart-of-SpeechtagsCreate bag of words ofthe lemmatized termsCreate bag of words ofthe stemmed termsCreate bag of words ofthe original terms Table Creator Stanford Lemmatizer Snowball Stemmer Term Processing Strings To Document Stanford Tagger Bag Of WordsCreator Bag Of WordsCreator Bag Of WordsCreator

Nodes

Extensions

Links