Icon

03.2_​Regex_​with_​PDFs_​Solution

02_Regex_with_PDFs_Solution
Session 3Exercise 3.2 - PDF Parser and Tika ParserSummary:In this exercise, we will use two approaches to read and parse the PDF files toextract useful information. Instructions:1) Execute the PDF Parser node to extract documents from the PDF files2) Use a Document Data Extractor node to extract text from each document3) Use a Column Expressions node to extract the dates from the documents 4) Visualize the results using a Table View node.Now try to use a Tika Parser to read the PDFs one by one and extract the dates. 5) Execute the List Files/Folders node to get the list of the paths to the PDFs6) Convert the paths to strings using a Path to String node7) Loop through the files paths using a chunk loop 8) In each iteration, parse the PDFs using the Tika Parser URL Input node andextract the dates. Extract the texts from eachdocument Parse the PDFs and extractthe documents Extract the dates from thetexts using REGEX Table View Node to visualizethe results List of the PDF files Convert the paths to string Loop Start In each iteration, parse the a PDF using aTika Parser URL Input and extract the datesusing a Column Expressions Loop End Extract documents from the PDFsdocument > texttext >datesfile path> textPDF pathsdocument > dates PDF Parser Document DataExtractor Column Expressions Table View Tika ParserURL Input Chunk Loop Start List Files/Folders Path to String Loop End Column Expressions Session 3Exercise 3.2 - PDF Parser and Tika ParserSummary:In this exercise, we will use two approaches to read and parse the PDF files toextract useful information. Instructions:1) Execute the PDF Parser node to extract documents from the PDF files2) Use a Document Data Extractor node to extract text from each document3) Use a Column Expressions node to extract the dates from the documents 4) Visualize the results using a Table View node.Now try to use a Tika Parser to read the PDFs one by one and extract the dates. 5) Execute the List Files/Folders node to get the list of the paths to the PDFs6) Convert the paths to strings using a Path to String node7) Loop through the files paths using a chunk loop 8) In each iteration, parse the PDFs using the Tika Parser URL Input node andextract the dates. Extract the texts from eachdocument Parse the PDFs and extractthe documents Extract the dates from thetexts using REGEX Table View Node to visualizethe results List of the PDF files Convert the paths to string Loop Start In each iteration, parse the a PDF using aTika Parser URL Input and extract the datesusing a Column Expressions Loop End Extract documents from the PDFsdocument > texttext >datesfile path> textPDF pathsdocument > dates PDF Parser Document DataExtractor Column Expressions Table View Tika ParserURL Input Chunk Loop Start List Files/Folders Path to String Loop End Column Expressions

Nodes

Extensions

Links