Icon

02 Reading Text Data

02 Reading Text Data
Exercise: Encapsulating strings into a document1) Read the KNIME-Tweets.table file available in the data folder. It contains a list oftweets about KNIME, the user, the date of the tweet, and the retweets. 2) Convert the date of the tweet from string to Local Date. Append a new column so thatyou don't lose the time information in the string column.3) Convert the tweets into documents with- Tweet column as the title and the full text- Twitter as the document source- User column as the author- Date column as the publication date Exercise: Reading data from pdf1) Read the 2020-05-25-l4-tp-5-sessions.pdf file available in the data folder with theTika Parser node. The file is the agenda of the L4-TP Introduction to Text Processinginstructor-led course. Enable the Extract attachments and embedded files and Extractinline images from pdfs options. You can select any directory.2) Take a look at the extracted image. What do you see?3) Convert the text output into a document. Use "2020-05-25" as the publication date. Not sure if this exercise is do-able really withthis data? Since the date/time in our RSS datafeed is already stored as a date/time data type,and is formated as a document already :) Node 1Node 4Node 5Node 6URL to RSS feedExampleRead RSS feed and generate document columnFilter unnecessarycolumnsTable Reader Strings To Document String to Date&Time Tika Parser Strings To Document Document Viewer Table Creator RSS Feed Reader Column Filter Exercise: Encapsulating strings into a document1) Read the KNIME-Tweets.table file available in the data folder. It contains a list oftweets about KNIME, the user, the date of the tweet, and the retweets. 2) Convert the date of the tweet from string to Local Date. Append a new column so thatyou don't lose the time information in the string column.3) Convert the tweets into documents with- Tweet column as the title and the full text- Twitter as the document source- User column as the author- Date column as the publication date Exercise: Reading data from pdf1) Read the 2020-05-25-l4-tp-5-sessions.pdf file available in the data folder with theTika Parser node. The file is the agenda of the L4-TP Introduction to Text Processinginstructor-led course. Enable the Extract attachments and embedded files and Extractinline images from pdfs options. You can select any directory.2) Take a look at the extracted image. What do you see?3) Convert the text output into a document. Use "2020-05-25" as the publication date. Not sure if this exercise is do-able really withthis data? Since the date/time in our RSS datafeed is already stored as a date/time data type,and is formated as a document already :) Node 1Node 4Node 5Node 6URL to RSS feedExampleRead RSS feed and generate document columnFilter unnecessarycolumnsTable Reader Strings To Document String to Date&Time Tika Parser Strings To Document Document Viewer Table Creator RSS Feed Reader Column Filter

Nodes

Extensions

Links