This example shows how to transform a document into a vector using a word vector model and using these vectors for classification. First, we read some test and train documents which are divided into three topics. We use the train dataset to train a Doc2Vec model using the topic as class attribute. The Word Vector learner now creates a vector for each word, and each label. Next, we use a Vocabulary extractor to extract the words and vectors from the model. On the second output port the Vocabulary Extractor will output the vectors for each label which we can then use as a kind of 'cluster center' for classification.
The next step is to convert our test documents into a vector using the word vector model. This can be done using the Word Vector Apply Node. This Node takes in documents and replaces every word with its corresponding word vector if present in the word vector model. We additionally configure the Node to calculate the mean of all vectors so we have a single vector as representation of the test documents.
At last we can now use a K Nearest Neighbor Node using our previously created 'cluster centers'. In the context of word vectors often the cosine distance is used.
Workflow Requirements
KNIME Analytics Platform 3.4.0
KNIME Deeplearning4J Integration
KNIME Deeplearning4J Integration Text Processing Extension
To use this workflow in KNIME, download it from the below URL and open it in KNIME:
Download WorkflowDeploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com, follow @NodePit on Twitter or botsin.space/@nodepit on Mastodon.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.