Icon

02 Reading Text Data

Exercise: Encapsulating strings into a document1) Read the KNIME-Tweets.table file available in the data folder. It contains a list oftweets about KNIME, the user, the date of the tweet, and the retweets. 2) Convert the date of the tweet from string to Local Date. Append a new column so thatyou don't lose the time information in the string column.3) Convert the tweets into documents with- Tweet column as the title and the full text- Twitter as the document source- User column as the author- Date column as the publication date Exercise: Reading data from pdf1) Read the 2020-05-25-l4-tp-5-sessions.pdf file available in the data folder with theTika Parser node. The file is the agenda of the L4-TP Introduction to Text Processinginstructor-led course. Enable the Extract attachments and embedded files and Extractinline images from pdfs options. You can select any directory.2) Take a look at the extracted image. What do you see?3) Convert the text output into a document. Use "2020-05-25" as the publication date. Exercise: Encapsulating strings into a document1) Read the KNIME-Tweets.table file available in the data folder. It contains a list oftweets about KNIME, the user, the date of the tweet, and the retweets. 2) Convert the date of the tweet from string to Local Date. Append a new column so thatyou don't lose the time information in the string column.3) Convert the tweets into documents with- Tweet column as the title and the full text- Twitter as the document source- User column as the author- Date column as the publication date Exercise: Reading data from pdf1) Read the 2020-05-25-l4-tp-5-sessions.pdf file available in the data folder with theTika Parser node. The file is the agenda of the L4-TP Introduction to Text Processinginstructor-led course. Enable the Extract attachments and embedded files and Extractinline images from pdfs options. You can select any directory.2) Take a look at the extracted image. What do you see?3) Convert the text output into a document. Use "2020-05-25" as the publication date.

Nodes

  • No nodes found

Extensions

  • No modules found

Links