Icon

05 Bag of Words and Frequencies

Exercise: Creating a bag of words and calculating frequenciesIn this exercise you'll calculate word frequencies in the agendas of two different instructor-led courses, L4-TPIntroduction to Text Processing and L4-TS Introduction to Time Series Analysis.1) Execute the Preprocessed Document (L4-TP) metanode. It accesses, applies tagging, and cleans the agenda textof the L4-TP instructor-led course.2) Create a bag of words representation of the preprocessed document3) Calculate the absolute term frequencies (TF) in the document. Which word occurs most often? How many times?4) Execute the Preprocessed Document (L4-TS) metanode. It accesses, tags, and cleans the agenda text of the L4-TS instructor-led course.5) Concatenate the output with the output of the other metanode6) Create a bag of words representation of the preprocessed documents7) Calculate the document frequencies (DF) of the words. How many words occur in both documents?8) Calculate the TF-IDF scores by multiplying the relative term frequencies (TF rel) with the inverse documentfrequencies (IDF). Which three words have the highest TF-IDF scores? Answers to questiontext - most occuring, has highest number28 words - # of coumns with 2series, times and text 3 highest words Node 43Node 44Node 45Node 46Node 47Node 48Node 50Node 51PreprocessedDocument (L4-TP) PreprocessedDocument (L4-TS) Bag Of WordsCreator TF Concatenate Bag Of WordsCreator DF TF IDF Math Formula Exercise: Creating a bag of words and calculating frequenciesIn this exercise you'll calculate word frequencies in the agendas of two different instructor-led courses, L4-TPIntroduction to Text Processing and L4-TS Introduction to Time Series Analysis.1) Execute the Preprocessed Document (L4-TP) metanode. It accesses, applies tagging, and cleans the agenda textof the L4-TP instructor-led course.2) Create a bag of words representation of the preprocessed document3) Calculate the absolute term frequencies (TF) in the document. Which word occurs most often? How many times?4) Execute the Preprocessed Document (L4-TS) metanode. It accesses, tags, and cleans the agenda text of the L4-TS instructor-led course.5) Concatenate the output with the output of the other metanode6) Create a bag of words representation of the preprocessed documents7) Calculate the document frequencies (DF) of the words. How many words occur in both documents?8) Calculate the TF-IDF scores by multiplying the relative term frequencies (TF rel) with the inverse documentfrequencies (IDF). Which three words have the highest TF-IDF scores? Answers to questiontext - most occuring, has highest number28 words - # of coumns with 2series, times and text 3 highest words Node 43Node 44Node 45Node 46Node 47Node 48Node 50Node 51PreprocessedDocument (L4-TP) PreprocessedDocument (L4-TS) Bag Of WordsCreator TF Concatenate Bag Of WordsCreator DF TF IDF Math Formula

Nodes

Extensions

Links