Icon

05 Bag of Words and Frequencies

Exercise: Creating a bag of words and calculating frequenciesIn this exercise you'll calculate word frequencies in the agendas of two different instructor-led courses, L4-TPIntroduction to Text Processing and L4-TS Introduction to Time Series Analysis.1) Execute the Preprocessed Document (L4-TP) metanode. It accesses, applies tagging, and cleans the agenda textof the L4-TP instructor-led course.2) Create a bag of words representation of the preprocessed document3) Calculate the absolute term frequencies (TF) in the document. Which word occurs most often? How many times?4) Execute the Preprocessed Document (L4-TS) metanode. It accesses, tags, and cleans the agenda text of the L4-TS instructor-led course.5) Concatenate the output with the output of the other metanode6) Create a bag of words representation of the preprocessed documents7) Calculate the document frequencies (DF) of the words. How many words occur in both documents?8) Calculate the TF-IDF scores by multiplying the relative term frequencies (TF rel) with the inverse documentfrequencies (IDF). Which three words have the highest TF-IDF scores? PreprocessedDocument (L4-TP) PreprocessedDocument (L4-TS) Exercise: Creating a bag of words and calculating frequenciesIn this exercise you'll calculate word frequencies in the agendas of two different instructor-led courses, L4-TPIntroduction to Text Processing and L4-TS Introduction to Time Series Analysis.1) Execute the Preprocessed Document (L4-TP) metanode. It accesses, applies tagging, and cleans the agenda textof the L4-TP instructor-led course.2) Create a bag of words representation of the preprocessed document3) Calculate the absolute term frequencies (TF) in the document. Which word occurs most often? How many times?4) Execute the Preprocessed Document (L4-TS) metanode. It accesses, tags, and cleans the agenda text of the L4-TS instructor-led course.5) Concatenate the output with the output of the other metanode6) Create a bag of words representation of the preprocessed documents7) Calculate the document frequencies (DF) of the words. How many words occur in both documents?8) Calculate the TF-IDF scores by multiplying the relative term frequencies (TF rel) with the inverse documentfrequencies (IDF). Which three words have the highest TF-IDF scores? PreprocessedDocument (L4-TP) PreprocessedDocument (L4-TS)

Nodes

Extensions

Links