0 ×

Tika Language Detector

StreamableKNIME Textprocessing Plug-in version 4.0.2.v201909251213 by KNIME AG, Zurich, Switzerland

This node uses the Apache Tika library to detect the language of a given String/Document value. The newly detected languages will be appended to the input table. The list of all supported languages can be seen here . If the text contains mixed languages, the detector will, by default, return the language with the most confidence value.

Options

String or Document column
The column containing the strings or documents to parse.
New language column
The name of the appended column for the languages. Can be left empty.
Show Confidence value
Specify whether the confidence value of each detected language should be appended in a new column.
New confidence value column
The name of the appended column for the confidence value column. Can be left empty.
Show all detected languages
Specify whether all detected languages should be shown in the output table as a collection list. This is important for the case where the text might contain mixed languages.

Input Ports

The input table containing the strings/documents to convert. The input table has to contain at least one String/Document column.

Output Ports

The output table containing the detected languages and if chosen, the confidence value of each detected language.

Best Friends (Incoming)

Best Friends (Outgoing)

Workflows

Installation

To use this node in KNIME, install KNIME Textprocessing from the following update site:

KNIME 4.0
Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform.

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.