Icon

05 Bag of Words and Frequencies

05 Bag of Words and Frequencies
Exercise: Creating a bag of words and calculating frequenciesIn this exercise you'll calculate word frequencies in the agendas of two different instructor-led courses, L4-TPIntroduction to Text Processing and L4-TS Introduction to Time Series Analysis.1) Execute the Preprocessed Document (L4-TP) metanode. It accesses, applies tagging, and cleans the agenda textof the L4-TP instructor-led course.2) Create a bag of words representation of the preprocessed document3) Calculate the absolute term frequencies (TF) in the document. Which word occurs most often? text How manytimes? four times4) Execute the Preprocessed Document (L4-TS) metanode. It accesses, tags, and cleans the agenda text of the L4-TS instructor-led course.5) Concatenate the output with the output of the other metanode6) Create a bag of words representation of the preprocessed documents7) Calculate the document frequencies (DF) of the words. How many words occur in both documents? 288) Calculate the TF-IDF scores by multiplying the relative term frequencies (TF rel) with the inverse documentfrequencies (IDF). Which three words have the highest TF-IDF scores? series, time, and text absolute termfrequenciesdocumentfrequenciesrelative termfrequenciesinversedocumentfrequenciesTF-IDF scores PreprocessedDocument (L4-TP) PreprocessedDocument (L4-TS) Bag Of WordsCreator TF Concatenate Bag Of WordsCreator DF TF IDF Math Formula Exercise: Creating a bag of words and calculating frequenciesIn this exercise you'll calculate word frequencies in the agendas of two different instructor-led courses, L4-TPIntroduction to Text Processing and L4-TS Introduction to Time Series Analysis.1) Execute the Preprocessed Document (L4-TP) metanode. It accesses, applies tagging, and cleans the agenda textof the L4-TP instructor-led course.2) Create a bag of words representation of the preprocessed document3) Calculate the absolute term frequencies (TF) in the document. Which word occurs most often? text How manytimes? four times4) Execute the Preprocessed Document (L4-TS) metanode. It accesses, tags, and cleans the agenda text of the L4-TS instructor-led course.5) Concatenate the output with the output of the other metanode6) Create a bag of words representation of the preprocessed documents7) Calculate the document frequencies (DF) of the words. How many words occur in both documents? 288) Calculate the TF-IDF scores by multiplying the relative term frequencies (TF rel) with the inverse documentfrequencies (IDF). Which three words have the highest TF-IDF scores? series, time, and text absolute termfrequenciesdocumentfrequenciesrelative termfrequenciesinversedocumentfrequenciesTF-IDF scores PreprocessedDocument (L4-TP) PreprocessedDocument (L4-TS) Bag Of WordsCreator TF Concatenate Bag Of WordsCreator DF TF IDF Math Formula

Nodes

Extensions

Links