Icon

Challenge 15 - Extracting a Table from a PDF - Solution

Extracting a Table from a PDF

Given a text-based PDF document with a table, can you partially extract the table into a KNIME data table for further analysis? For this challenge we will extract the table from https://www.mountwashington.org/uploads/forms/2021/10.pdf and attempt to partially reconstruct it within KNIME. The corresponding KNIME table should contain the following columns: Day, Max, Min, Norm, Depart, Heat, and Cool. Note 1: Your final output should be a table, not a single row with all the relevant data. Note 2: The Tika Parser node is better suited for this task than the PDF Parser node. We completed this task without components, regular expressions, or code-snippet nodes. In fact, our solution has a total of 10 nodes, but labeling the columns required a bit of manual effort.

Challenge 15 - Extracting a Table from a PDFGiven a text-based PDF document with a table, can you partially extract the table into a KNIME data table for further analysis?For this challenge we will extract the table from https://www.mountwashington.org/uploads/forms/2021/10.pdf and attempt topartially reconstruct it within KNIME. The corresponding KNIME table should contain the following columns: Day, Max, Min,Norm, Depart, Heat, and Cool. Note 1: Your final output should be a table, not a single row with all the relevant data. Note 2: TheTika Parser node is better suited for this task than the PDF Parser node. We completed this task without components, regularexpressions, or code-snippet nodes. In fact, our solution has a total of 10 nodes, but labeling the columns required a bit ofmanual effort. data from:https://www.mountwashington.org/uploads/forms/2021/10.pdfseparate allnew linescolumns to rowssplit by spacefilter by new row ids to target datareset row idsfix column namesextract current column namesremove non-target columnscopy paste the column name and column typefrom Extract Table Spec to change names Tika Parser Cell Splitter Transpose Cell Splitter Row Filter RowID Insert ColumnHeader Extract Table Spec Column Filter Table Creator Challenge 15 - Extracting a Table from a PDFGiven a text-based PDF document with a table, can you partially extract the table into a KNIME data table for further analysis?For this challenge we will extract the table from https://www.mountwashington.org/uploads/forms/2021/10.pdf and attempt topartially reconstruct it within KNIME. The corresponding KNIME table should contain the following columns: Day, Max, Min,Norm, Depart, Heat, and Cool. Note 1: Your final output should be a table, not a single row with all the relevant data. Note 2: TheTika Parser node is better suited for this task than the PDF Parser node. We completed this task without components, regularexpressions, or code-snippet nodes. In fact, our solution has a total of 10 nodes, but labeling the columns required a bit ofmanual effort. data from:https://www.mountwashington.org/uploads/forms/2021/10.pdfseparate allnew linescolumns to rowssplit by spacefilter by new row ids to target datareset row idsfix column namesextract current column namesremove non-target columnscopy paste the column name and column typefrom Extract Table Spec to change names Tika Parser Cell Splitter Transpose Cell Splitter Row Filter RowID Insert ColumnHeader Extract Table Spec Column Filter Table Creator

Nodes

Extensions

Links