Icon

02 Reading Text Data

02 Reading Text Data
Exercise: Encapsulating strings into a document1) Read the KNIME-Tweets.table file available in the data folder. It contains a list oftweets about KNIME, the user, the date of the tweet, and the retweets. 2) Convert the date of the tweet from string to Local Date. Append a new column so thatyou don't lose the time information in the string column.3) Convert the tweets into documents with- Tweet column as the title and the full text- Twitter as the document source- User column as the author- Date column as the publication date Exercise: Reading data from pdf1) Read the 2020-05-25-l4-tp-5-sessions.pdf file available in the data folder with theTika Parser node. The file is the agenda of the L4-TP Introduction to Text Processinginstructor-led course. Enable the Extract attachments and embedded files and Extractinline images from pdfs options. You can select any directory.2) Take a look at the extracted image. What do you see?3) Convert the text output into a document. Use "2020-05-25" as the publication date. URL to RSS feedExampleRead RSS feed and generate document columnFilter unnecessarycolumnsRead RSS feed and generate document columnURL to RSS feedExampleFilter unnecessarycolumns Document Viewer Table Creator RSS Feed Reader Column Filter RSS Feed Reader Table Creator Column Filter Exercise: Encapsulating strings into a document1) Read the KNIME-Tweets.table file available in the data folder. It contains a list oftweets about KNIME, the user, the date of the tweet, and the retweets. 2) Convert the date of the tweet from string to Local Date. Append a new column so thatyou don't lose the time information in the string column.3) Convert the tweets into documents with- Tweet column as the title and the full text- Twitter as the document source- User column as the author- Date column as the publication date Exercise: Reading data from pdf1) Read the 2020-05-25-l4-tp-5-sessions.pdf file available in the data folder with theTika Parser node. The file is the agenda of the L4-TP Introduction to Text Processinginstructor-led course. Enable the Extract attachments and embedded files and Extractinline images from pdfs options. You can select any directory.2) Take a look at the extracted image. What do you see?3) Convert the text output into a document. Use "2020-05-25" as the publication date. URL to RSS feedExampleRead RSS feed and generate document columnFilter unnecessarycolumnsRead RSS feed and generate document columnURL to RSS feedExampleFilter unnecessarycolumns Document Viewer Table Creator RSS Feed Reader Column Filter RSS Feed Reader Table Creator Column Filter

Nodes

Extensions

Links