OCR_Python_Portable_CEP

OCR Foreign Language PDFs with Python and KNIME (with Tesseract and PDFium) This workflow shows you how to OCR a Foreign Language using Python and KNIME. If the desired language does not show up in the drop-down when configuring the OCR Component, additional languages can be set by tweaking the script. Conda is needed on the machine and needs to be set up according to the "Prerequisites" section in this documentation (under Preferences - KNIME - Conda). For portability, the Conda Environment Propagation node sets up the environment, so it should be not necessary to install the following environment. The commands are stated for sake of completeness, in case a workaround without the CEP node is being created:<ul><li>Linux: conda create -n knime_ocr_tess_pdfium -c knime -c conda-forge --strict-channel-priority python=3.11 knime-python-scripting=5.8 pypdfium2 opencv pytesseract tesseract pillow numpy pandas</li><li>Windows: conda create -n knime_ocr_tess_pdfium -c knime -c conda-forge -c pypdfium2-team -c bblanchon --strict-channel-priority python=3.11 knime-python-scripting=5.8 pypdfium2-team::pypdfium2_helpers opencv pytesseract tesseract pillow numpy pandas</li></ul> Note: If any language other than English is selected, the workflow will download Tesseract's appropriate language files and store them within the workflow folder under /data/tessdata/This workflow is based on this one: https://hub.knime.com/s/hDBtIjjK900pPNaK

URL: Conda Documentation, Prerequisites https://docs.knime.com/ap/latest/python_installation_guide/#prerequisites

OCR_​Python_​Portable_​CEP

Portable OCR with Tesseract and PDFium, Automatic Environment Install

Nodes

Extensions

Links

Download

OCR_Python_Portable_CEP