Icon

Data Wrangling _​ Reading and Transfering PDF Files

https://forum.knime.com/t/pdf-extract-text-plus-export-to-excel/63074

Data Wrangling _ Reading and Transfering PDF Fileshttps://forum.knime.com/t/pdf-extract-text-plus-export-to-excel/63074 Source Data: Referenced PDF files were originally provided by @Felipereis50 in caption's mentioned KNIME Forumlink. You can find the sample PDFs in 'current workflow data area'. Section 0: Regex supported Extract 'CNPJ' code and 'Referencia'Initial challenge. Clean file list to be exported as report sheet (i.e. Excel Writer) Section 2: Code Supported [copy_and_rename()]Copy file to new assigned folder and rename. Section 1: Loop Embedded Transfer Files (Table) node [copy_and_rename()]Knime based solution supported with Transfer Files (Table) node. As suggested from @mlauber71 https://hub.knime.com/-/spaces/-/latest/~J5B4sWDZ7QQGw2wK/ Output Table KNIME: copy_and_rename() R: copy_and_rename() Py: copy_and_rename() Reading PDFregexReplaceExtract $Content$CNPJ:.*ListRelative to:C:\Data\PDF_folder... to VariableString: $Location$regexReplaceExtract $Content$Referência:.*new_file-namecopy: from: file name to: new_file-name source and target folder from variableregexReplaceExtract $Content$CNPJ:.*regexReplaceExtract $Content$Referência:.*new_file-namecollect loop statusupper: 'source_folder' lower: 'target_folder'... to Variable$${Starget_folder path}$$ with 'target_folder' string pathextract parent folder (source) original file_name$target_folder path$ $source_folder path$ regexReplaceExtract $Content$Referência:.*regexReplaceExtract $Content$CNPJ:.*new_file-pathcopy from: file_pathto: new_file_pathFilepath to Pathstart loop stepping on dataframe file listFilepath to Pathcopy: from: file name to: new_file-name source and target folder from variableextract parent folder (source) original file_name Tika Parser String Manipulation List Files/Folders Table Rowto Variable Path to String String Manipulation String Manipulation R Snippet String Manipulation String Manipulation String Manipulation Loop End Rule-basedRow Splitter Table Rowto Variable fix sourcefile_name target_folder_pathfrom URI target_folder_pathfrom URI String Manipulation String Manipulation String Manipulation Transfer Files(Table) String to Path Group Loop Start String to Path Python Script fix sourcefile name Data Wrangling _ Reading and Transfering PDF Fileshttps://forum.knime.com/t/pdf-extract-text-plus-export-to-excel/63074 Source Data: Referenced PDF files were originally provided by @Felipereis50 in caption's mentioned KNIME Forumlink. You can find the sample PDFs in 'current workflow data area'. Section 0: Regex supported Extract 'CNPJ' code and 'Referencia'Initial challenge. Clean file list to be exported as report sheet (i.e. Excel Writer) Section 2: Code Supported [copy_and_rename()]Copy file to new assigned folder and rename. Section 1: Loop Embedded Transfer Files (Table) node [copy_and_rename()]Knime based solution supported with Transfer Files (Table) node. As suggested from @mlauber71 https://hub.knime.com/-/spaces/-/latest/~J5B4sWDZ7QQGw2wK/ Output Table KNIME: copy_and_rename() R: copy_and_rename() Py: copy_and_rename() Reading PDFregexReplaceExtract $Content$CNPJ:.*ListRelative to:C:\Data\PDF_folder... to VariableString: $Location$regexReplaceExtract $Content$Referência:.*new_file-namecopy: from: file name to: new_file-name source and target folder from variableregexReplaceExtract $Content$CNPJ:.*regexReplaceExtract $Content$Referência:.*new_file-namecollect loop statusupper: 'source_folder' lower: 'target_folder'... to Variable$${Starget_folder path}$$ with 'target_folder' string pathextract parent folder (source) original file_name$target_folder path$ $source_folder path$ regexReplaceExtract $Content$Referência:.*regexReplaceExtract $Content$CNPJ:.*new_file-pathcopy from: file_pathto: new_file_pathFilepath to Pathstart loop stepping on dataframe file listFilepath to Pathcopy: from: file name to: new_file-name source and target folder from variableextract parent folder (source) original file_name Tika Parser String Manipulation List Files/Folders Table Rowto Variable Path to String String Manipulation String Manipulation R Snippet String Manipulation String Manipulation String Manipulation Loop End Rule-basedRow Splitter Table Rowto Variable fix sourcefile_name target_folder_pathfrom URI target_folder_pathfrom URI String Manipulation String Manipulation String Manipulation Transfer Files(Table) String to Path Group Loop Start String to Path Python Script fix sourcefile name

Nodes

Extensions

Links