Icon

04 Cleaning and Filtering

04 Cleaning and Filtering
Exercise: Document Cleaning and FilteringIn this exercise you'll reduce and clean the Microsoft RSS feeds text.1) Execute the Tagged document metanode. It applies POS and NE tagging to the agenda text.2) Remove punctuation from the text3) Convert all tokens to lower case4) Filter out stop words using the built-in list and a custom list with the following words: "session", "word","agenda", "nofollow","<p>","</p>", "rel", "href", etc. Check the Ignore unmodifiable flag option.5) Filter the document to nouns (POS tags NN, NNP, NNPS and NNS). Check the Ignore unmodifiable flagoption.6) Exclude locations, organizations and "unknown" NE tags. Check the Ignore unmodifiable flag option.7) Inspect the result with the Tagged Document Viewer node. What are the first three tokens in the document?closing, cybersecurity, and skills remove punctuationto lowerstopwordsfilter tonounsexclude locations,organizations, and "unknown"NE tags Tagged document Punctuation Erasure Case Converter Stop Word Filter Table Creator Tag Filter Tag Filter Tagged DocumentViewer Exercise: Document Cleaning and FilteringIn this exercise you'll reduce and clean the Microsoft RSS feeds text.1) Execute the Tagged document metanode. It applies POS and NE tagging to the agenda text.2) Remove punctuation from the text3) Convert all tokens to lower case4) Filter out stop words using the built-in list and a custom list with the following words: "session", "word","agenda", "nofollow","<p>","</p>", "rel", "href", etc. Check the Ignore unmodifiable flag option.5) Filter the document to nouns (POS tags NN, NNP, NNPS and NNS). Check the Ignore unmodifiable flagoption.6) Exclude locations, organizations and "unknown" NE tags. Check the Ignore unmodifiable flag option.7) Inspect the result with the Tagged Document Viewer node. What are the first three tokens in the document?closing, cybersecurity, and skills remove punctuationto lowerstopwordsfilter tonounsexclude locations,organizations, and "unknown"NE tags Tagged document Punctuation Erasure Case Converter Stop Word Filter Table Creator Tag Filter Tag Filter Tagged DocumentViewer

Nodes

Extensions

Links