Stanford Lemmatizer

Lemmatizes terms contained in the input documents with the Stanford Core NLP library. For details about the Stanford Core NLP library, please check here.

This node returns the lemma of a term by removing inflections, e.g in case of plurals, pronoun case, and verb endings. The lemma is based heavily on the Part-Of-Speech (POS) tag of a term, so either the Stanford tagger node or the POS tagger node has to be applied before using the Lemmatizer. If more than one POS tag is found for one term, only the first one will be taken into consideration. Moreover, terms with no POS tag will be skipped by default.

Currently only english language is supported by this node.


Lemmatizer options

Node should fail when terms with no POS tag are found
If checked, the node will fail when at least one term has no POS tag. Otherwise, all terms that have no POS tag will simply be skipped.

Preprocessing options

Document column
The column containing the documents to preprocess.
Replace column
If checked, the document column will be replaced by the new preprocessed documents. Otherwise the preprocessed documents will be appended as a new column.
Append column
The name of the new appended column, containing the preprocessed documents.
Ignore unmodifiable tag
If checked, unmodifiable terms will be preprocessed too.

Input Ports

The input table which contains the documents to preprocess.

Output Ports

The output table which contains the preprocessed documents.


This node has no views




You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.