Doc2Vec Learner (legacy)

This node trains a Doc2Vec (Paragraph Vectors) model on labelled documents. The model will learn sequence representations (vector for each label) as well as word representations (vector for each word), that can be extracted using the Vocabulary Extractor Node. For more information on Word Vectors in general see: http://deeplearning4j.org/word2vec

The KNIME Deeplearning4J Integration has been marked as legacy with KNIME Analytics Platform 5.0 and will be deprecated in a future version. If you are using this extension in a production workflow, consider switching to one of the other deep learning integrations available in KNIME Analytics Platform.

Options

Data Options

Document Column: Column containing the Document or String to train on.
Label Column: String column containing labels for documents.

Learning Options

Learning Rate: The starting learning rate.
Minimum Learning Rate: The minimum learning rate threshold. The learning rate will decay automatically over time.
Layer Size: The length of the resulting word vectors.
Batch Size: The number of words to use for each batch.
Seed: A seed value to use for training.
Number of Epochs: The number of epochs to train.
Number of Training Iterations: The number of updates done for each batch.
Context Window Size: Size of the context, meaning a window around each word to consider for learning.
Minimum Word Frequency: Minimum frequency of a word to appear in the corpus to be considered for learning. Words with a lower frequency will not appear in the vocabulary contained in the Word Vector Model.
Sampling Rate: Threshold for configuring which higher-frequency words are randomly downsampled; a useful range is (0, 1e-5).
Use Hierarchical Softmax?: Whether to use hierarchical softmax. If checked negative sampling will be disabled.
Negative Sampling Rate: The number of “noise words” that should be drawn.
Sequence Learning Algorithm: The algorithm to use to learn document representations.
Skip missing cells?: Whether rows containing missing cells should be skipped or not. If missing cells should not be skipped but the table contains missing cells the node will fail.

Input Ports

: Table containing Document or String columns.

Output Ports

: Trained Doc2Vec Model.

Popular Predecessors

Popular Successors

Views

This node has no views

Workflows

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension KNIME Textprocessing - Deeplearning4J Integration (64bit only) (legacy) from the below update site following our NodePit Product and Node Installation Guide:

v5.6

A zipped version of the software site can be downloaded here.

Plugin provider: KNIME AG, Zurich, Switzerland

Plugin version: 5.6.0.v202507151410

On NodePit since: 2025-08-15

Last update: 2025-08-15

KNIME versions: Since v3.6

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!