Icon

Group_​2_​Machine_​Learning

Part 4: Transformation and Frequencies- Use the Bag Of Words Creator to create a bag of words- Use the TF node to calculate the relative term frequencies- Use the Document Vector node to get a vector representation of eachdocumentHint: Make sure that the check box "As collection cell" is NOT activated,as the decision tree learner node can not handle collections.Optional: Calcualute inverse document frequency (IDF) Part 1: Reading and Parsing- Read the dataset IMDb-sample.csv (Tip: Drag and drop the dataset from the explorer to the Workflow Editor)(Tip 2: Change the data type of the column Index to string in the configuration window)- Use the Strings to Document node to create documents(Hint: Use the following settings: Title Column = Index Full Text = Text Activate "Use categories from column" and set Document category column = Sentiment)- Use the Column Filter node to delete all columns except the document column Optional: Use the Document Viewer node to take a look at the documents Part 2: Enrichment- Use the POS Tagger nodeto assign part of speechtagsOptional: Use theDocument Viewer node tovisualize the tags Part 3: Preprocessing- Use the Punctuation Erasure node to remove punctuation (Note: The default settings of the first text-pre-processing node appends a document type column with the pre-processed document. The default setting for all the following nodes willupdate the pre-processed document.)- Use the Number Filter node to remove numbers- Use the N Chars Filter node to remove words with less than 3 characters- Use the Stop Word Filter node to delete words with very little meaning, such as "and", "the", "a"...- Use the Case Converter node to lower case all words- Use the Snowball Stemmer node to reduce words to the stem- Use the Tag Filter node to delete all words besides adjectives, adverbs and nouns (Tip: See Penn Treebank P.O.S. Tags here: https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html)Optional: Use the Document Viewer node to take a look at the preprocessed document Part 5: Classification- Use the Category To Class node to extract the class labels from the documents- Use the Partitioning node to create a training and test set- Use the Decision Tree Learner node to train a model on the training set- Use the Decision Tree Predictor node to apply the trained decision tree model to the test set- Use the Scorer node to evaluate the modelOptional: Use other algorithms to train a model. Use the ROC Curve to evaluate the model. Part 4: Transformation and Frequencies- Use the Bag Of Words Creator to create a bag of words- Use the TF node to calculate the relative term frequencies- Use the Document Vector node to get a vector representation of eachdocumentHint: Make sure that the check box "As collection cell" is NOT activated,as the decision tree learner node can not handle collections.Optional: Calcualute inverse document frequency (IDF) Part 1: Reading and Parsing- Read the dataset IMDb-sample.csv (Tip: Drag and drop the dataset from the explorer to the Workflow Editor)(Tip 2: Change the data type of the column Index to string in the configuration window)- Use the Strings to Document node to create documents(Hint: Use the following settings: Title Column = Index Full Text = Text Activate "Use categories from column" and set Document category column = Sentiment)- Use the Column Filter node to delete all columns except the document column Optional: Use the Document Viewer node to take a look at the documents Part 2: Enrichment- Use the POS Tagger nodeto assign part of speechtagsOptional: Use theDocument Viewer node tovisualize the tags Part 3: Preprocessing- Use the Punctuation Erasure node to remove punctuation (Note: The default settings of the first text-pre-processing node appends a document type column with the pre-processed document. The default setting for all the following nodes willupdate the pre-processed document.)- Use the Number Filter node to remove numbers- Use the N Chars Filter node to remove words with less than 3 characters- Use the Stop Word Filter node to delete words with very little meaning, such as "and", "the", "a"...- Use the Case Converter node to lower case all words- Use the Snowball Stemmer node to reduce words to the stem- Use the Tag Filter node to delete all words besides adjectives, adverbs and nouns (Tip: See Penn Treebank P.O.S. Tags here: https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html)Optional: Use the Document Viewer node to take a look at the preprocessed document Part 5: Classification- Use the Category To Class node to extract the class labels from the documents- Use the Partitioning node to create a training and test set- Use the Decision Tree Learner node to train a model on the training set- Use the Decision Tree Predictor node to apply the trained decision tree model to the test set- Use the Scorer node to evaluate the modelOptional: Use other algorithms to train a model. Use the ROC Curve to evaluate the model.

Nodes

  • No nodes found

Extensions

  • No modules found

Links