Icon

05 Bag of Words and Frequencies

Exercise: Creating a bag of words and calculating frequenciesIn this exercise you'll calculate word frequencies in the agendas of two different instructor-led courses, L4-TPIntroduction to Text Processing and L4-TS Introduction to Time Series Analysis.1) Execute the Preprocessed Document (L4-TP) metanode. It accesses, applies tagging, and cleans the agenda textof the L4-TP instructor-led course.2) Create a bag of words representation of the preprocessed document3) Calculate the absolute term frequencies (TF) in the document. Which word occurs most often? textHow many times?44) Execute the Preprocessed Document (L4-TS) metanode. It accesses, tags, and cleans the agenda text of the L4-TS instructor-led course.5) Concatenate the output with the output of the other metanode6) Create a bag of words representation of the preprocessed documents7) Calculate the document frequencies (DF) of the words. How many words occur in both documents?33128) Calculate the TF-IDF scores by multiplying the relative term frequencies (TF rel) with the inverse documentfrequencies (IDF). Which three words have the highest TF-IDF scores? series, time, text TF-IDF Bag Of WordsCreator TF Concatenate DF Bag Of WordsCreator IDF TF Math Formula PreprocessedDocument (L4-TP) PreprocessedDocument (L4-TS) Exercise: Creating a bag of words and calculating frequenciesIn this exercise you'll calculate word frequencies in the agendas of two different instructor-led courses, L4-TPIntroduction to Text Processing and L4-TS Introduction to Time Series Analysis.1) Execute the Preprocessed Document (L4-TP) metanode. It accesses, applies tagging, and cleans the agenda textof the L4-TP instructor-led course.2) Create a bag of words representation of the preprocessed document3) Calculate the absolute term frequencies (TF) in the document. Which word occurs most often? textHow many times?44) Execute the Preprocessed Document (L4-TS) metanode. It accesses, tags, and cleans the agenda text of the L4-TS instructor-led course.5) Concatenate the output with the output of the other metanode6) Create a bag of words representation of the preprocessed documents7) Calculate the document frequencies (DF) of the words. How many words occur in both documents?33128) Calculate the TF-IDF scores by multiplying the relative term frequencies (TF rel) with the inverse documentfrequencies (IDF). Which three words have the highest TF-IDF scores? series, time, text TF-IDF Bag Of WordsCreator TF Concatenate DF Bag Of WordsCreator IDF TF Math Formula PreprocessedDocument (L4-TP) PreprocessedDocument (L4-TS)

Nodes

Extensions

Links