Challenge 15 - Extracting a Table from a PDF - Solution

Extracting a Table from a PDF

Given a text-based PDF document with a table, can you partially extract the table into a KNIME data table for further analysis? For this challenge we will extract the table from https://www.mountwashington.org/uploads/forms/2021/10.pdf and attempt to partially reconstruct it within KNIME. The corresponding KNIME table should contain the following columns: Day, Max, Min, Norm, Depart, Heat, and Cool. Note 1: Your final output should be a table, not a single row with all the relevant data. Note 2: The Tika Parser node is better suited for this task than the PDF Parser node. We completed this task without components, regular expressions, or code-snippet nodes. In fact, our solution has a total of 10 nodes, but labeling the columns required a bit of manual effort.

Nodes

Extensions

Download

To use this workflow in KNIME, download it from the below URL and open it in KNIME:

Download Workflow

Created by: victorpalacios

Created at: 2022-04-08

On NodePit since: 2024-03-05

Last update: 2025-08-06

Created with KNIME version: v5.2.1

Tags: justknimeitjustknimeit-15tika-parserocrharddata extractiondata engineering

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!

Challenge 15 - Extracting a Table from a PDF - Solution

Nodes

Extensions

Links

Download