Here we use word embedding instead of hot encoding, using a Word2Vec Learner node. The hidden layer size is set to 10, therefore producing an embedding with very small dimensionality.
The output of the Word2Vec Learner node is a model. The Vocabulary Extractor node extracts the words from the model vocabulary and provides their embedding in form of collections. Collection items are isolated using a Split Collection column node and the distances between word emebedding vectors are calculated.
At the end, n selected words are visualized on a scatter plot, to show proximity of same semantic words across different embedding coordinates. The String input node allows to insert one selected word and retrieve all word distances from that word. Smaller distances should correspond to closer words in context or in meaning.
To use this workflow in KNIME, download it from the below URL and open it in KNIME:
Download WorkflowDeploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.