0 ×

Document Vector Hashing

StreamableKNIME Textprocessing Plug-in version 4.3.0.v202011212014 by KNIME AG, Zurich, Switzerland

This node creates a document vector for each document representing it in the terms space. The values of the feature vectors can be specified as boolean values or as values of either the relative frequency or the absolute frequency of the terms. The advantages of using this node instead of the normal document vector node is that the dimension of the vectors is always fixed and therefore this node is streamable.


Document column
The column containing the documents to use.
The dimension of the output document vector. The bigger the dimension, the less likely collisions would tend to happen. However, be aware of the curse of dimensionality.
Seed for the hashing function.
Hashing function
Choose which hashing function should be used to hash the document terms.
Vector type
There are three ways to fill the values in the document vector.
Binary : The vector will be a bit vector.
TF-Absolute : At each index where a term is hashed to, the value of the absolute term frequency of that term will calculated and stored at the index.
TF-Relative : At each index where a term is hashed to, the value of the relative term frequency of that term will calculated and stored at the index.
As collection cell
If checked all vector entries will be stored in a collection cell consisting of double cells. If not checked all double cells will be stored in corresponding columns. The advantage of the column representation is that most of the regular algorithms in KNIME can be applied. The disadvantage is (which is on the other hand the advantage of the collection representation) that processing of subsequent nodes will be slowed down, due to the many columns that will be created (dependent on the input data of course).

Input Ports

The input table containing the documents.

Output Ports

An output table containing the input documents with the corresponding document vectors.
The model output containing the specifications that have been used for document vector creation.

Best Friends (Incoming)

Best Friends (Outgoing)



To use this node in KNIME, install KNIME Textprocessing from the following update site:


A zipped version of the software site can be downloaded here.

You don't know what to do with this link? Read our NodePit Product and Node Installation Guide that explains you in detail how to install nodes to your KNIME Analytics Platform.

Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform. Browse NodePit from within KNIME, install nodes with just one click and share your workflows with NodePit Space.


You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.