Word Vector Learner (legacy)

This Node Is Deprecated — This node is kept for backwards-compatibility, but the usage in new workflows is no longer recommended. The documentation below might contain more information.
This node learns a Word Vector Model based on either labelled (Doc2Vec) or unlabelled (Word2Vec) documents or Strings. This results in a Word Vector model containing Neural Word Embeddings depending on the chosen learning method. For more information on Word Vectors see: http://deeplearning4j.org/word2vec

The KNIME Deeplearning4J Integration has been marked as legacy with KNIME Analytics Platform 5.0 and will be deprecated in a future version. If you are using this extension in a production workflow, consider switching to one of the other deep learning integrations available in KNIME Analytics Platform.

Options

WordVector Training Mode
Training Mode for Word Vector training:
  • Word2Vec: Training without labels. Learns word vectors based on words.
  • Doc2Vec: Training with labels. Learns word vectors based on labels associated with documents.
Use Basic Token Preprocessing?
Whether to do basic preprocessing of the tokens, meaning to convert them to lower case and remove punctuation.
Seed
A seed value to use for training.
Learning Rate
The learning rate that should be used for training.
Minimum Learning Rate
Minimum learning rate threshold which the learning rate should not fall below.
Batch Size
The number of words to use for each batch.
Epochs
The number of epochs to train.
Number of Training Iterations
The number of updates done for each batch.
Layer Size
The size of the output Layer. This means this will be the length of the resulting word vectors.
Minimum Word Frequency
Minimum frequency of a word to appear in the corpus to be considered for learning. Words with a lower frequency will no appear in the vocabulary contained in the Word Vector Model.
Window Size
Size of the context, meaning number of words, to consider for learning.

Column Selection

Label Column
Possible String column containing labels for Documents.
Document Column
The column containing the Document or String to train on.

Input Ports

Icon
Table containing Document or String Column.

Output Ports

Icon
Trained Word Vector Model

Views

This node has no views

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.