Icon

KNIME_​challenge15_​solution

KNIME_challenge15_solution
result+-----+-----+-----+-----+------+--------+----+| DAY | MAX | MIN | AVG | NORM | DEPART | .. |+-----+-----+-----+-----+------+--------+----+| 1 | 32 | 27 | 30 | 37 | -7 | .. || 2 | 42 | 31 | 37 | 37 | 0 | .. || 3 | 45 | 41 | 43 | 37 | 6 | .. || 4 | 48 | 41 | 45 | 36 | 9 | .. || 5 | 50 | 43 | 47 | 36 | 11 | .. || 6 | 52 | 41 | 47 | 35 | 12 | .. || 7 | 53 | 41 | 47 | 35 | 12 | .. || 8 | 52 | 38 | 45 | 34 | 11 | .. || 9 | 51 | 38 | 45 | 34 | 11 | .. || .. | .. | .. | .. | .. | .. | .. |+-----+-----+-----+-----+------+--------+----+ Challenge 15: Extracting a Table from a PDFLevel: HardDescription: Given a text-based PDF document with a table, can you partially extract the table into a KNIME data table for furtheranalysis? For this challenge we will extract the table from this PDF document and attempt to partially reconstruct it within KNIME.The corresponding KNIME table should contain the following columns: Day, Max, Min, Norm, Depart, Heat, and Cool. Note 1: Yourfinal output should be a table, not a single row with all the relevant data. Note 2: The Tika Parser node is better suited for thistask than the PDF Parser node. We completed this task without components, regular expressions, or code-snippet nodes. Infact, our solution has a total of 10 nodes, but labeling the columns required a bit of manual effort.Author: Victor Palacios Dynamic download PDF Load PDF and parse content Dynamically identiftytable dimensions Parse the partial header names Split data table to columns Inside of component. Only for display purposes MINIMUM TOOL ALTERNATIVE (8 nodes) knime huburlNode 33Node 34Load PDFNode 41split content onnew linegenerate rowskeyword fortable searchfind table startrepeat restof tableremove rowsabove tableadd rowcounterget tabledimensionsNode 53apply tabledimensionsNode 55Node 56Node 57fix DIR columnreplace headersfix header columnsNode 63Node 64Node 65Retypemulti-rowheadersAlternativ tofixed row filterNode 68fix DIR columnNode 70Node 71Node 72 HTTP(S) Connector Extract ContextProperties VariableExpressions Tika Parser Transfer Files Cell Splitter Ungroup StringConfiguration Column Expressions Moving Aggregation Row Filter Counter Generation GroupBy Table Rowto Variable Row Filter Component Input Component Output Cell Splitter Column Expressions Insert ColumnHeader Column Expressions Row Splitter Cell Splitter Transpose Column Rename Dynamic data table Row Filter Column Expressions Cell Splitter Table Creator Insert ColumnHeader result+-----+-----+-----+-----+------+--------+----+| DAY | MAX | MIN | AVG | NORM | DEPART | .. |+-----+-----+-----+-----+------+--------+----+| 1 | 32 | 27 | 30 | 37 | -7 | .. || 2 | 42 | 31 | 37 | 37 | 0 | .. || 3 | 45 | 41 | 43 | 37 | 6 | .. || 4 | 48 | 41 | 45 | 36 | 9 | .. || 5 | 50 | 43 | 47 | 36 | 11 | .. || 6 | 52 | 41 | 47 | 35 | 12 | .. || 7 | 53 | 41 | 47 | 35 | 12 | .. || 8 | 52 | 38 | 45 | 34 | 11 | .. || 9 | 51 | 38 | 45 | 34 | 11 | .. || .. | .. | .. | .. | .. | .. | .. |+-----+-----+-----+-----+------+--------+----+ Challenge 15: Extracting a Table from a PDFLevel: HardDescription: Given a text-based PDF document with a table, can you partially extract the table into a KNIME data table for furtheranalysis? For this challenge we will extract the table from this PDF document and attempt to partially reconstruct it within KNIME.The corresponding KNIME table should contain the following columns: Day, Max, Min, Norm, Depart, Heat, and Cool. Note 1: Yourfinal output should be a table, not a single row with all the relevant data. Note 2: The Tika Parser node is better suited for thistask than the PDF Parser node. We completed this task without components, regular expressions, or code-snippet nodes. Infact, our solution has a total of 10 nodes, but labeling the columns required a bit of manual effort.Author: Victor Palacios Dynamic download PDF Load PDF and parse content Dynamically identiftytable dimensions Parse the partial header names Split data table to columns Inside of component. Only for display purposes MINIMUM TOOL ALTERNATIVE (8 nodes) knime huburlNode 33Node 34Load PDFNode 41split content onnew linegenerate rowskeyword fortable searchfind table startrepeat restof tableremove rowsabove tableadd rowcounterget tabledimensionsNode 53apply tabledimensionsNode 55Node 56Node 57fix DIR columnreplace headersfix header columnsNode 63Node 64Node 65Retypemulti-rowheadersAlternativ tofixed row filterNode 68fix DIR columnNode 70Node 71Node 72 HTTP(S) Connector Extract ContextProperties VariableExpressions Tika Parser Transfer Files Cell Splitter Ungroup StringConfiguration Column Expressions Moving Aggregation Row Filter Counter Generation GroupBy Table Rowto Variable Row Filter Component Input Component Output Cell Splitter Column Expressions Insert ColumnHeader Column Expressions Row Splitter Cell Splitter Transpose Column Rename Dynamic data table Row Filter Column Expressions Cell Splitter Table Creator Insert ColumnHeader

Nodes

Extensions

Links