First, we read in a dataset containing sentences and assign each document a unique label. The unique label is used to create a document vector which represents the whole document and not only singe words. Next, we train a Doc2Vec model using the Word Vector Learner node. The Learner Node will output a word vector model containing a vocabulary of all learned words and labels with corresponding word vectors. This can be extracted using a Vocabulary Extractor node witch outputs a column containing the word and a collection column containing the corresponding word vector in the first output port and the same for the labels in the second output port. The length of the vector (layer size) as well as other learning parameters can be adjusted in the Word Vector Learner Node Dialog.
In order to visualize the result of the Learner, we select six sentences from the training set containing five sentences which are very similar and one sentence which is dissimilar to the other five sentences. Next, we use a PCA to reduce the dimensionality of our document vectors to two so we can plot them in a scatter plot. In the plot, we can now easily distinguish between the sentences as the dissimilar sentence has a very large distance to all other sentences whereas the similar sentences have a small distance to each other.
Workflow Requirements
KNIME Analytics Platform 3.4.0
KNIME Deeplearning4J Integration
KNIME Deeplearning4J Integration Text Processing Extension
To use this workflow in KNIME, download it from the below URL and open it in KNIME:
Download WorkflowDeploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com, follow @NodePit on Twitter or botsin.space/@nodepit on Mastodon.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.