Document Vector Adapter

This Node Is Deprecated — This node is kept for backwards-compatibility, but the usage in new workflows is no longer recommended. The documentation below might contain more information.

This node creates a document vector for each document representing it in the terms space, exactly as the normal document vector node. The difference is that this node takes two data tables as input:
1. Table containing the bag-of-words terms
2. Table containing the reference document vector

The terms from the first input will be converted into document vectors using the vector from the second input as the reference. Features that appear in first table, but not in the reference table will be filtered out, and features that appear in the reference table, but not in the first table will be added to the output vector and their values will be set to 0.

The values of the feature vectors can be specified as boolean values or as values of a specified column i.e. an tf*idf column. The dimension of the vectors will be the number of distinct terms in the BoW.

Options

Option

Document column: The column containing the documents to use.
Ignore tags: If checked tags are ignored when comparing terms.
Bitvector: If checked a bitvector will be created indicating whether a certain term is contained in a document or not.
Vector value: If Bitvector setting is not checked it is possible to specify the column to use as feature vector values. The column can i.e. contain tf*idf values which are than used as values of the feature vector. Be aware that you have to compute these values before using this node. To do so i.e. the frequency calculation nodes can be used.
As collection cell: If checked all vector entries will be stored in a collection cell consisting of double cells. The cells are ordered, the ordering is specified in the data table spec. If not checked all double cells will be stored in corresponding columns. The advantage of the column representation is that most of the regular algorithms in KNIME can be applied. The disadvantage is (which is on the other hand the advantage of the collection representation) that processing of subsequent nodes will be slowed down, due to the many columns that will be created (dependent on the input data of course).

Feature Column Selection

Feature Column Selection: Selects all columns from the reference table containing features that should appear in the output document vector.

Input Ports

: The input table containing the bag of words.
: The input reference table containing the reference document vector.

Output Ports

: An output table containing the documents with the corresponding document vectors, whose terms are identical to the ones in the reference document vector.

Popular Predecessors

Popular Successors

Views

This node has no views

Workflows

No workflows found

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension KNIME Textprocessing from the below update site following our NodePit Product and Node Installation Guide:

v5.6

A zipped version of the software site can be downloaded here.

Plugin provider: KNIME AG, Zurich, Switzerland

Plugin version: 5.6.0.v202507151412

On NodePit since: 2025-08-15

Last update: 2025-08-19

Tags: Deprecated

KNIME versions: Since v3.6

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!