To perform the actual training, hierarchical softmax and negative sampling are both available. The node uses Tensorflow as engine to speed up the pre-processing and to fit the model. Given the presence of a CUDA compatible NVIDIA GPU, training can be performed on the GPU.
Select which document type column you want to use to train the model.
Set seeds for the whole node.
Choose the seed number, if you do not want the default one.
Choose the device where to run the fit for the Word2Vec model; only the visible devices are available. Notice that the indexes next to the device name are just identifiers for the device itself.
Change the embedding size of the two Word2Vec embedding layers (for target and context words, respectively) in order to get speed (smaller number) or performance (larger number).
Choose the radius of the window size that represents how far from the target word Word2Vec looks. The context window always has the target word at the center, and the number that can be set determines the "radius" of the window, meaning that the actual number of context words considered is twice what is inserted.
The negative sampling approach is a way to simplify the computational complexity of vanilla Word2Vec while trying to introduce noise in the models in order to regularize it. You can choose the number of negative samples.
Activate hierarchical softmax in place of negative sampling. This option thus deactivates negative sampling.
Choose between CBOW (target as output) and skip-gram (context as output) Word2Vec implementation.
Whether to use a word survival function to reduce the size of the vocabulary by prioritizing rarer words.
Set the sampling rate for the Word Survival function, the higher it gets the more words are included in the dictionary. Default value is 10^-3. Max value is 0.1.
Minimum corpus frequency below which a word in the dictionary is not considered. Set it to 0 if filtering according to minimum frequency is not needed.
Number of epochs for model training. The more epoch, the longer time to train, linearly.
The batch size you want to set to train the Word2Vec model.
Set the learning rate for the Adam optimizer. The actual step in the parameter space is dynamic during training.
You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.
To use this node in KNIME, install the extension Word2Vec with Tensorflow from the below update site following our NodePit Product and Node Installation Guide:
A zipped version of the software site can be downloaded here.
Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.
Try NodePit Runner!Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to mail@nodepit.com, follow @NodePit on Twitter or botsin.space/@nodepit on Mastodon.
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.