Icon

05 Bag of Words and Frequencies

Exercise: Creating a bag of words and calculating frequenciesIn this exercise you'll calculate word frequencies in the agendas of two different instructor-led courses, L4-TPIntroduction to Text Processing and L4-TS Introduction to Time Series Analysis.1) Execute the Preprocessed Document (L4-TP) metanode. It accesses, applies tagging, and cleans the agenda textof the L4-TP instructor-led course.2) Create a bag of words representation of the preprocessed document3) Calculate the absolute term frequencies (TF) in the document. Which word occurs most often? How many times?The test [NNP(POS)] occurs the most often, 4 times.4) Execute the Preprocessed Document (L4-TS) metanode. It accesses, tags, and cleans the agenda text of the L4-TS instructor-led course.5) Concatenate the output with the output of the other metanode6) Create a bag of words representation of the preprocessed documents7) Calculate the document frequencies (DF) of the words. How many words occur in both documents?28 words8) Calculate the TF-IDF scores by multiplying the relative term frequencies (TF rel) with the inverse documentfrequencies (IDF). Which three words have the highest TF-IDF scores? series, time, and text bag of wordsterm freqbring togetherbag of wordsdf wordsterm freqIDF freqmultiply PreprocessedDocument (L4-TP) PreprocessedDocument (L4-TS) Bag Of WordsCreator TF Concatenate Bag Of WordsCreator DF TF IDF Math Formula Exercise: Creating a bag of words and calculating frequenciesIn this exercise you'll calculate word frequencies in the agendas of two different instructor-led courses, L4-TPIntroduction to Text Processing and L4-TS Introduction to Time Series Analysis.1) Execute the Preprocessed Document (L4-TP) metanode. It accesses, applies tagging, and cleans the agenda textof the L4-TP instructor-led course.2) Create a bag of words representation of the preprocessed document3) Calculate the absolute term frequencies (TF) in the document. Which word occurs most often? How many times?The test [NNP(POS)] occurs the most often, 4 times.4) Execute the Preprocessed Document (L4-TS) metanode. It accesses, tags, and cleans the agenda text of the L4-TS instructor-led course.5) Concatenate the output with the output of the other metanode6) Create a bag of words representation of the preprocessed documents7) Calculate the document frequencies (DF) of the words. How many words occur in both documents?28 words8) Calculate the TF-IDF scores by multiplying the relative term frequencies (TF rel) with the inverse documentfrequencies (IDF). Which three words have the highest TF-IDF scores? series, time, and text bag of wordsterm freqbring togetherbag of wordsdf wordsterm freqIDF freqmultiply PreprocessedDocument (L4-TP) PreprocessedDocument (L4-TS) Bag Of WordsCreator TF Concatenate Bag Of WordsCreator DF TF IDF Math Formula

Nodes

Extensions

Links