Tess4J

Tess4J Node integrates the Tesseract OCR library into KNIME. You may change the language and Tesseract datapath (path to language files) in the node configuration.

Options

Tessdata Path (*)

If you want to use your own .traineddata tesseract language files, select "Use External" and select the folder which contains those language files.

Language

Tesseract uses languages to help with the optical character recognition. Default is "eng" (English), but there are other languages bundled with this plugin. If you have your own tessdata path defined, languages found in that path will automatically be listed here.

Deskew input images

Some images may not be exactly horizontal, but rather slightly rotated. If this option is checked, the node will undo the rotation. This comes with a slight performance cost, so if you know that the text in your images is perfectly horizontal, be sure to turn this off. If images are slightly rotated, this option is required, since optical character recognition will not work properly otherwise.

Page Segmentation Mode

Define how your page is segmented here. For flow variables use the ID.

Legend: (ID) Name Description

(0) OSD Only Orientation and script detection (OSD) only.
(1) Auto Pageseg and OSD Automatic page segmentation and OSD.
(2) Auto Pageseg Only Automatic page segmentation, no OSD and no OCR.
(3) Full Auto Pageseg Fully automatic page segmentation, no OSD.
(4) Single Column Assume a single column of text of variable sizes.
(5) Single Vert Block Assume a single uniform block of vertically aligned text.
(6) Single Block Assume a single uniform block of text. (default)
(7) Single Line Treat the image as a single text line.
(8) Single Word Treat the image as a single word.
(9) Circle Word Treat the image as a single word in a circle.
(10) Single Char Treat the image as a single character.
(11) Sparse Text Find as much text as possible in no particular order.
(12) Sparse Test with OSD Sparse text with OSD.

OCR Engine Mode

Define which OCR engine to use. For flow variables use ID.

Legend: (ID) Name Description

(0) Tesseract Only Only use Tesseract. (Fastest)
(1) Cube Only Only use Cube. This is slower than "Tesseract Only", but more accurate.
(2) Tesseract And Cube Use both of the above and combine results. (Best accuracy)
(3) Default Use language specific configuration or "Tesseract Only", if not specified.

Advanced Config

Tesseract Config (*)

This allows defining key/value pairs as user-defined tesseract configuration variables. See (http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version) for a list.

Advanced users can use this tab to make use of tesseracts flexibility, but may result in unexpected errors, since the variables are not checked for correctness.

The "Import" and "Export" buttons allow you to load and write tesseract configuration files from the list.

Column Selection

Column Creation Mode: Mode how to handle the selected column. The processed column can be added to a new table, appended to the end of the table, or the old column can be replaced by the new result
Column Suffix: A suffix appended to the column name. If "Append" is not selected, it can be left empty.
Column Selection: Selection of the columns to be processed.

Input Ports

: Text as Images.

Output Ports

: OCR Result String

Popular Predecessors

Popular Successors

Views

Image Viewer: Another, possibly interactive, view on table cells. Displays the selected cells with their associated viewer if it exists. Available views are:
- Missing Value Viewer
-- An empty viewer that is shown when the input cell has no value to display.
- Labeling View
-- View on a labeling/segmentation
- Histogram Viewer
-- This viewer shows the histogram of the currently selected image.
- BigDataViewer
-- A viewer shown when the user selects an interval of rows and columns in the viewer. This viewer combines all images and labelings in the selected interval to one image by rendering them next to each other. Alternatively, the images and labelings can be layed over each other.
- Image Viewer
-- This viewer renders the selected image-cell.
- Combined View
-- A viewer shown when the user selects an interval of rows and columns in the viewer. This viewer combines all images and labelings in the selected interval to one image by rendering them next to each other. Alternatively, the images and labelings can be layed over each other.
- XML
-- XML tree

Workflows

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension KNIME Image Processing - Tess4J Integration from the below update site following our NodePit Product and Node Installation Guide:

v5.5

Plugin provider: University of Konstanz

Plugin version: 1.3.3.v202307241154

On NodePit since: 2025-07-02

Last update: 2025-08-12

Tags: Streamable

KNIME versions: v5.5, v5.4, v5.3, v5.2, v5.1, v4.7, v4.6, v4.5, v4.4, v4.3, v4.2, v4.1, v4.0, v3.7, v3.6

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!