Icon

05 Bag of Words and Frequencies

Exercise: Creating a bag of words and calculating frequenciesIn this exercise you'll calculate word frequencies in the agendas of two different instructor-led courses, L4-TP Introduction to Text Processing and L4-TSIntroduction to Time Series Analysis.1) Execute the Preprocessed Document (L4-TP) metanode. It accesses, applies tagging, and cleans the agenda text of the L4-TP instructor-led course.2) Create a bag of words representation of the preprocessed document3) Calculate the absolute term frequencies (TF) in the document. Which word occurs most often? How many times? text 0.0524) Execute the Preprocessed Document (L4-TS) metanode. It accesses, tags, and cleans the agenda text of the L4-TS instructor-led course.5) Concatenate the output with the output of the other metanode6) Create a bag of words representation of the preprocessed documents7) Calculate the document frequencies (DF) of the words. How many words occur in both documents?8) Calculate the TF-IDF scores by multiplying the relative term frequencies (TF rel) with the inverse document frequencies (IDF). Which three words have thehighest TF-IDF scores? series, time, text Node 49Node 48Node 50Node 43Node 44Node 45Node 46Node 47 IDF TF Math Formula PreprocessedDocument (L4-TP) PreprocessedDocument (L4-TS) Bag Of WordsCreator TF Concatenate Bag Of WordsCreator DF Exercise: Creating a bag of words and calculating frequenciesIn this exercise you'll calculate word frequencies in the agendas of two different instructor-led courses, L4-TP Introduction to Text Processing and L4-TSIntroduction to Time Series Analysis.1) Execute the Preprocessed Document (L4-TP) metanode. It accesses, applies tagging, and cleans the agenda text of the L4-TP instructor-led course.2) Create a bag of words representation of the preprocessed document3) Calculate the absolute term frequencies (TF) in the document. Which word occurs most often? How many times? text 0.0524) Execute the Preprocessed Document (L4-TS) metanode. It accesses, tags, and cleans the agenda text of the L4-TS instructor-led course.5) Concatenate the output with the output of the other metanode6) Create a bag of words representation of the preprocessed documents7) Calculate the document frequencies (DF) of the words. How many words occur in both documents?8) Calculate the TF-IDF scores by multiplying the relative term frequencies (TF rel) with the inverse document frequencies (IDF). Which three words have thehighest TF-IDF scores? series, time, text Node 49Node 48Node 50Node 43Node 44Node 45Node 46Node 47 IDF TF Math Formula PreprocessedDocument (L4-TP) PreprocessedDocument (L4-TS) Bag Of WordsCreator TF Concatenate Bag Of WordsCreator DF

Nodes

Extensions

Links