Icon

04 Cleaning and Filtering - Custom

04 Cleaning and Filtering - Solution
Exercise: Document Cleaning and FilteringIn this exercise you'll reduce and clean the agenda text of the L4-TP instructor-led course.1) Execute the Tagged document metanode. It applies POS and NE tagging to the agenda text.2) Remove punctuation from the text3) Convert all tokens to lower case4) Filter out stop words using the built-in list and a custom list with the following words: "session", "word","agenda". Check the Ignore unmodifiable flag option.5) Filter the document to proper nouns (POS tags NNP and NNPS). Check the Ignore unmodifiable flag option.6) Exclude locations, organizations and "unknown" NE tags. Check the Ignore unmodifiable flag option.7) Inspect the result with the Tagged Document Viewer node. What are the first three tokens in the document? built-inandcustom listonly nounsignore NElocation, organization,unknownURL to RSS feedExampleRead RSS feed and generate document columnFilter unnecessarycolumns Punctuation Erasure Stop Word Filter Case Converter Tag Filter Tagged DocumentViewer Tag Filter Tagged document Table Creator RSS Feed Reader Column Filter Exercise: Document Cleaning and FilteringIn this exercise you'll reduce and clean the agenda text of the L4-TP instructor-led course.1) Execute the Tagged document metanode. It applies POS and NE tagging to the agenda text.2) Remove punctuation from the text3) Convert all tokens to lower case4) Filter out stop words using the built-in list and a custom list with the following words: "session", "word","agenda". Check the Ignore unmodifiable flag option.5) Filter the document to proper nouns (POS tags NNP and NNPS). Check the Ignore unmodifiable flag option.6) Exclude locations, organizations and "unknown" NE tags. Check the Ignore unmodifiable flag option.7) Inspect the result with the Tagged Document Viewer node. What are the first three tokens in the document? built-inandcustom listonly nounsignore NElocation, organization,unknownURL to RSS feedExampleRead RSS feed and generate document columnFilter unnecessarycolumns Punctuation Erasure Stop Word Filter Case Converter Tag Filter Tagged DocumentViewer Tag Filter Tagged document Table Creator RSS Feed Reader Column Filter

Nodes

Extensions

Links