Process PDF With OCR

Go to Product

This endpoint processes a PDF file using OCR (Optical Character Recognition). Users can specify languages, sidecar, deskew, clean, cleanFinal, ocrType, ocrRenderType, and removeImagesAfter options. Uses OCRmyPDF if available, falls back to Tesseract. Input:PDF Output:PDF Type:SI-Conditional

Options

File Input
Languages
List of languages to use in OCR processing, e.g., 'eng', 'deu'
Set Sidecar
Enable to set the optional field Sidecar
Sidecar
Include OCR text in a sidecar text file if set to true
Set Deskew
Enable to set the optional field Deskew
Deskew
Deskew the input file if set to true
Set Clean
Enable to set the optional field Clean
Clean
Clean the input file if set to true
Set Clean Final
Enable to set the optional field Clean Final
Clean Final
Clean the final output if set to true
Ocr Type
Specify the OCR type, e.g., 'skip-text', 'force-ocr', or 'Normal'
Ocr Render Type
Specify the OCR render type, either 'hocr' or 'sandwich'
Set Remove Images After
Enable to set the optional field Remove Images After
Remove Images After
Remove images from the output PDF if set to true
Result Format

Specify how the response should be mapped to the table output. The following formats are available:

Raw Response: Returns the raw response in a single row with the following columns:

  • body: Response body
  • status: HTTP status code

Input Ports

Icon
Configuration data.

Output Ports

Icon
Result of the request depending on the selected Result Format.
Icon
Configuration data (this is the same as the input port; it is provided as passthrough for sequentially chaining nodes to declutter your workflow connections).

Popular Predecessors

  • No recommendations found

Popular Successors

  • No recommendations found

Views

This node has no views

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.