Zemberek Stemmer

Stems terms contained in the input documents with the Zemberek stemming algorithm, terms will be disambiguated and reduced to their stem. The Zemberek stemming algorithm works for Turkish texts only.
Warning: It is highly recommended to use this node only with documents that have been tokenized with the Zemberek TurkishTokenizer. Otherwise term information (letter case, tags etc.) might be lost. Please double-check the node configurations of the preceding nodes.


Preprocessing options

Document column
The column containing the documents to preprocess.
Replace documents
If checked, the documents will be replaced by the new preprocessed documents. Otherwise the preprocessed documents will be appended as new column.
Append column
The name of the new appended column, containing the preprocessed documents.
Ignore unmodifiable tag
If checked, unmodifiable terms will be preprocessed too.

Stemming options

Maintain capitalization
If checked, the capitalization will be preserved.

Input Ports

The input table which contains the documents to preprocess.

Output Ports

The output table which contains the preprocessed documents.


