Icon

Challenge 15 - Extracting a Table from a PDF - Solution

Extracting a Table from a PDF

Given a text-based PDF document with a table, can you partially extract the table into a KNIME data table for further analysis? For this challenge we will extract the table from https://www.mountwashington.org/uploads/forms/2021/10.pdf and attempt to partially reconstruct it within KNIME. The corresponding KNIME table should contain the following columns: Day, Max, Min, Norm, Depart, Heat, and Cool. Note 1: Your final output should be a table, not a single row with all the relevant data. Note 2: The Tika Parser node is better suited for this task than the PDF Parser node. We completed this task without components, regular expressions, or code-snippet nodes. In fact, our solution has a total of 10 nodes, but labeling the columns required a bit of manual effort.

Nodes

Extensions

Links