Icon

03_​Document_​Classification

03_Document_Classification
Document Classification This is a workflow for topic classification. After converting the Documents into word vectors, it becomes a traditional classification problem which can be solved using any Machine Learning supervised training algorithm. We chose a decision tree, but it could have beenanything else.Metanode "Limit # keywords" artificially limits the number of extracted keywords to limit the number of produced columns. Since the dataset used here is quite small, we do not want to run the risk of lack of generalization by havingtoo many columns vs. too few rows in the training set.Document Vector Applier node applies the word vector extracted in the training set and removes all words that might be present in the test set but not in the training set.Category To Class extracts the content in the category field of the Document and places it in a column named "class". English POSRead articles from PubmedPubmed_Articles.csvExtract labelTraining 80/ test set 20to word vectorExtract labelcleaningstemmingtag filteringto wordvector3 keywords3 keywordskeep keywordsin > 5 DocumentsColor by sentimentlabel Enrichment Reading Data Category To Class Partitioning Document Vector DecisionTree Learner Decision TreePredictor Category To Class Pre-processing DocumentVector Applier Keygraph KeywordExtractor Keygraph KeywordExtractor Limit # keywords Color Manager Color Appender Scorer Document Classification This is a workflow for topic classification. After converting the Documents into word vectors, it becomes a traditional classification problem which can be solved using any Machine Learning supervised training algorithm. We chose a decision tree, but it could have beenanything else.Metanode "Limit # keywords" artificially limits the number of extracted keywords to limit the number of produced columns. Since the dataset used here is quite small, we do not want to run the risk of lack of generalization by havingtoo many columns vs. too few rows in the training set.Document Vector Applier node applies the word vector extracted in the training set and removes all words that might be present in the test set but not in the training set.Category To Class extracts the content in the category field of the Document and places it in a column named "class". English POSRead articles from PubmedPubmed_Articles.csvExtract labelTraining 80/ test set 20to word vectorExtract labelcleaningstemmingtag filteringto wordvector3 keywords3 keywordskeep keywordsin > 5 DocumentsColor by sentimentlabelEnrichment Reading Data Category To Class Partitioning Document Vector DecisionTree Learner Decision TreePredictor Category To Class Pre-processing DocumentVector Applier Keygraph KeywordExtractor Keygraph KeywordExtractor Limit # keywords Color Manager Color Appender Scorer

Nodes

Extensions

Links