Dictionary Replacer

Replaces complete terms contained in the input documents that match with specified dictionary terms with a corresponding specified value. The dictionary is provided by an additional data table at the second data port, consisting of at least two string columns. One string column contains the strings to replace (keys) the other string column contains the replacement strings (values). The columns can be specified in the dialog.

Options

Preprocessing options

Document column
The column containing the documents to preprocess.
Replace documents
If checked, the documents will be replaced by the new preprocessed documents. Otherwise the preprocessed documents will be appended as new column.
Append column
The name of the new appended column, containing the preprocessed documents.
Ignore unmodifiable tag
If checked, unmodifiable terms will be preprocessed too.

Dictionary

Column containing the strings to replace
The column containing the strings (words/terms) to replace (keys).
Column containing the replacement strings
The column containing the replacement strings (values).
Replace words not in the dictionary by
If checked, all words that are not available in the dictionary are replaced by the string entered in the text field.
Note: Entering an empty string or a string consisting solely of whitespaces leads to the removal of all terms not contained in the dictionary.
Word tokenizer
Select the tokenizer used for word tokenization. Go to Preferences -> KNIME -> Textprocessing to read the description for each tokenizer.

Input Ports

Icon
The input table which contains the documents to preprocess.
Icon
The input table containing at least of two string columns (dictionary).

Output Ports

Icon
The output table which contains the preprocessed documents.

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.