Icon

78914 - Extract data from PDF

Forum Posthttps://forum.knime.com/t/pdf-table-to-excel/78914ChallengeExtract data from a PDF as a structure table. NotesInstead of one complex RegEx use individual splits,followign "Divide et impera" principle.If more complex strings are present, like theproduct description, split apart teh easy maatchesfirst following the same principle aas before. ReadContentNode 2SplitLine Item NoRemoveLine Item NoRemoveProduct CodeSplitProduct CodeRemove(Per Unit) Bid PriceSplit(Per Unit) Bid PriceRemovePrice End DateSplitPrice End DateRemovePrice Start DateSplitPrice Start DateSplit Results toProduct DescriptionSort ColumnNamesDefioneColumn OrderNode 36Fix Dayin Dates Tika ParserURL Input Table Creator String Replacer String Replacer String Replacer String Replacer String Replacer String Replacer String Replacer String Replacer String Replacer String Replacer Column Rename(Regex) ReferenceColumn Resorter Table Creator String to Date&Time String Manipulation(Multi Column) Manage Line Breaks Determine Starand End of Data Clean upunwanted Data Split multipletables into rows Split by Line Break intoindividual data items Sanitize Data Forum Posthttps://forum.knime.com/t/pdf-table-to-excel/78914ChallengeExtract data from a PDF as a structure table. NotesInstead of one complex RegEx use individual splits,followign "Divide et impera" principle.If more complex strings are present, like theproduct description, split apart teh easy maatchesfirst following the same principle aas before. ReadContentNode 2SplitLine Item NoRemoveLine Item NoRemoveProduct CodeSplitProduct CodeRemove(Per Unit) Bid PriceSplit(Per Unit) Bid PriceRemovePrice End DateSplitPrice End DateRemovePrice Start DateSplitPrice Start DateSplit Results toProduct DescriptionSort ColumnNamesDefioneColumn OrderNode 36Fix Dayin Dates Tika ParserURL Input Table Creator String Replacer String Replacer String Replacer String Replacer String Replacer String Replacer String Replacer String Replacer String Replacer String Replacer Column Rename(Regex) ReferenceColumn Resorter Table Creator String to Date&Time String Manipulation(Multi Column) Manage Line Breaks Determine Starand End of Data Clean upunwanted Data Split multipletables into rows Split by Line Break intoindividual data items Sanitize Data

Nodes

Extensions

Links