Doc2Vec Learner

This node trains a Doc2Vec (Paragraph Vectors) model on labelled documents. The model will learn sequence representations (vector for each label) as well as word representations (vector for each word), that can be extracted using the Vocabulary Extractor Node. For more information on Word Vectors in general see:


Data Options

Document Column
Column containing the Document or String to train on.
Label Column
String column containing labels for documents.

Learning Options

Learning Rate
The starting learning rate.
Minimum Learning Rate
The minimum learning rate threshold. The learning rate will decay automatically over time.
Layer Size
The length of the resulting word vectors.
Batch Size
The number of words to use for each batch.
A seed value to use for training.
Number of Epochs
The number of epochs to train.
Number of Training Iterations
The number of updates done for each batch.
Context Window Size
Size of the context, meaning a window around each word to consider for learning.
Minimum Word Frequency
Minimum frequency of a word to appear in the corpus to be considered for learning. Words with a lower frequency will not appear in the vocabulary contained in the Word Vector Model.
Sampling Rate
Threshold for configuring which higher-frequency words are randomly downsampled; a useful range is (0, 1e-5).
Use Hierarchical Softmax?
Whether to use hierarchical softmax. If checked negative sampling will be disabled.
Negative Sampling Rate
The number of “noise words” that should be drawn.
Sequence Learning Algorithm
The algorithm to use to learn document representations.
Skip missing cells?
Whether rows containing missing cells should be skipped or not. If missing cells should not be skipped but the table contains missing cells the node will fail.

Input Ports

Table containing Document or String columns.

Output Ports

Trained Doc2Vec Model.


This node has no views




You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.