0 ×

07_​Simple_​Document_​Classification_​Using_​Word_​Vectors

Workflow

Simple Document Classification using Word Vectors
This example shows how to transform a document into a vector using a word vector model and using these vectors for classification.
deeplearning machine learning word2vec doc2vec word vectors embeddings classification
Simple Document Classification using Word VectorsThis example shows how to transform a document into a vector using a word vector model and using these vectors forclassification.First, we read some test and train documents which are divided into three topics. We use the train dataset to train aDoc2Vec model using the topic as class attribute. The Word Vector learner now creates a vector for each word, and eachlabel. Next, we use a Vocabulary extractor to extract the words and vectors from the model. On the second output port theVocabulary Extractor will output the vectors for each label which we can then use as a kind of 'cluster center' forclassification. The next step is to convert our test documents into a vector using the word vector model. This can be done using theWord Vector Apply Node. This Node takes in documents and replaces every word with its corresponding word vector ifpresent in the word vector model. We additionally configure the Node to calculate the mean of all vectors so we have asingle vector as representation of the test documents. At last we can now use a K Nearest Neighbor Node using our previously created 'cluster centers'. In the context of wordvectors often the cosine distance is used.Workflow RequirementsKNIME Analytics Platform 3.4.0KNIME Deeplearning4J IntegrationKNIME Deeplearning4J Integration Text Processing Extension Read TrainingDocuments Read Test Documents Column Filter Scorer Doc2Vec Learner Word Vector Apply VocabularyExtractor Split CollectionColumns K Nearest Neighbor(Distance Function) Simple Document Classification using Word VectorsThis example shows how to transform a document into a vector using a word vector model and using these vectors forclassification.First, we read some test and train documents which are divided into three topics. We use the train dataset to train aDoc2Vec model using the topic as class attribute. The Word Vector learner now creates a vector for each word, and eachlabel. Next, we use a Vocabulary extractor to extract the words and vectors from the model. On the second output port theVocabulary Extractor will output the vectors for each label which we can then use as a kind of 'cluster center' forclassification. The next step is to convert our test documents into a vector using the word vector model. This can be done using theWord Vector Apply Node. This Node takes in documents and replaces every word with its corresponding word vector ifpresent in the word vector model. We additionally configure the Node to calculate the mean of all vectors so we have asingle vector as representation of the test documents. At last we can now use a K Nearest Neighbor Node using our previously created 'cluster centers'. In the context of wordvectors often the cosine distance is used.Workflow RequirementsKNIME Analytics Platform 3.4.0KNIME Deeplearning4J IntegrationKNIME Deeplearning4J Integration Text Processing Extension Read TrainingDocuments Read Test Documents Column Filter Scorer Doc2Vec Learner Word Vector Apply VocabularyExtractor Split CollectionColumns K Nearest Neighbor(Distance Function)

Download

Get this workflow from the following link: Download

Resources

Nodes

07_​Simple_​Document_​Classification_​Using_​Word_​Vectors consists of the following 27 nodes(s):

Plugins

07_​Simple_​Document_​Classification_​Using_​Word_​Vectors contains nodes provided by the following 4 plugin(s):