This node is currently not available in KNIME v5.4 — instead we’re showing this page for KNIME v5.2. You can use the version menu in the title bar to permanently switch your preferred version. This will also show the link to the update site.

Word2Vec Learner (Tensorflow)

To perform the actual training, hierarchical softmax and negative sampling are both available. The node uses Tensorflow as engine to speed up the pre-processing and to fit the model. Given the presence of a CUDA compatible NVIDIA GPU, training can be performed on the GPU.

Options

Column selection (String type): Select which document type column you want to use to train the model.
Set seed: Set seeds for the whole node.
Seed: Choose the seed number, if you do not want the default one.
Device for Tensorflow model fit: Choose the device where to run the fit for the Word2Vec model; only the visible devices are available. Notice that the indexes next to the device name are just identifiers for the device itself.

Word2Vec parameters

Embedding size: Change the embedding size of the two Word2Vec embedding layers (for target and context words, respectively) in order to get speed (smaller number) or performance (larger number).
Window size (radius): Choose the radius of the window size that represents how far from the target word Word2Vec looks. The context window always has the target word at the center, and the number that can be set determines the "radius" of the window, meaning that the actual number of context words considered is twice what is inserted.
Number of negative samples: The negative sampling approach is a way to simplify the computational complexity of vanilla Word2Vec while trying to introduce noise in the models in order to regularize it. You can choose the number of negative samples.
Hierarchical Softmax: Activate hierarchical softmax in place of negative sampling. This option thus deactivates negative sampling.
Word2Vec algorithm selection: Choose between CBOW (target as output) and skip-gram (context as output) Word2Vec implementation.
Word Survival Function: Whether to use a word survival function to reduce the size of the vocabulary by prioritizing rarer words.
Sampling rate for Word Survival Function (if flagged): Set the sampling rate for the Word Survival function, the higher it gets the more words are included in the dictionary. Default value is 10^-3. Max value is 0.1.
Minimum Frequency: Minimum corpus frequency below which a word in the dictionary is not considered. Set it to 0 if filtering according to minimum frequency is not needed.

Training parameters

Epochs: Number of epochs for model training. The more epoch, the longer time to train, linearly.
Batch size: The batch size you want to set to train the Word2Vec model.
Adam learning rate: Set the learning rate for the Adam optimizer. The actual step in the parameter space is dynamic during training.

Input Ports

: A KNIME table with a string column to use for Word2Vec training

Output Ports

: A KNIME table with three columns: the index of the token/the word, the token itself and the embedding for the token as a collection (KNIME native list).

Popular Predecessors

No recommendations found

Popular Successors

No recommendations found

Views

This node has no views

Workflows

No workflows found

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension Word2Vec with Tensorflow from the below update site following our NodePit Product and Node Installation Guide:

v5.2

A zipped version of the software site can be downloaded here.

Plugin provider: simonedigregorio

Plugin version: 0.6.2.202409270218

On NodePit since: 2023-12-07

Last update: 2024-12-18

KNIME versions: From v5.2 to v5.2

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!