Zemberek Stemmer

Stems terms contained in the input documents with the Zemberek stemming algorithm, terms will be disambiguated and reduced to their stem. The Zemberek stemming algorithm works for Turkish texts only.
Warning: It is highly recommended to use this node only with documents that have been tokenized with the Zemberek TurkishTokenizer. Otherwise term information (letter case, tags etc.) might be lost. Please double-check the node configurations of the preceding nodes.

Options

Preprocessing options

Document column
The column containing the documents to preprocess.
Replace documents
If checked, the documents will be replaced by the new preprocessed documents. Otherwise the preprocessed documents will be appended as new column.
Append column
The name of the new appended column, containing the preprocessed documents.
Ignore unmodifiable tag
If checked, unmodifiable terms will be preprocessed too.

Stemming options

Maintain capitalization
If checked, the capitalization will be preserved.

Input Ports

Icon
The input table which contains the documents to preprocess.

Output Ports

Icon
The output table which contains the preprocessed documents.

Views

This node has no views

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.