Tika Language Detector

This node uses the Apache Tika library to detect the language of a given String/Document value. The newly detected languages will be appended to the input table. The list of all supported languages can be seen here . If the text contains mixed languages, the detector will, by default, return the language with the most confidence value.

Options

String or Document column
The column containing the strings or documents to parse.
New language column
The name of the appended column for the languages. Can be left empty.
Show Confidence value
Specify whether the confidence value of each detected language should be appended in a new column.
New confidence value column
The name of the appended column for the confidence value column. Can be left empty.
Show all detected languages
Specify whether all detected languages should be shown in the output table as a collection list. This is important for the case where the text might contain mixed languages.

Input Ports

Icon
The input table containing the strings/documents to convert. The input table has to contain at least one String/Document column.

Output Ports

Icon
The output table containing the detected languages and if chosen, the confidence value of each detected language.

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.