0 ×

Term Vector

KNIME Textprocessing Plug-in version 3.6.1.v201808311938 by KNIME AG, Zurich, Switzerland

This node creates a term vector for each term to represent the terms in the document space. The values of the feature vectors can be specified as boolean values or as values of a specified column i.e. an tf*idf column. The dimension of the vectors will be equal to the number of distinct documents in the BoW.

Options

Document column
The column containing the documents to use.
Ignore tags
If checked tags are ignored when comparing terms.
Bitvector
If checked a bitvector will be created indicating whether a certain document contains a term or not.
Vector value
If Bitvector setting is not checked it is possible to specify the column to use as feature vector values. The column can i.e. contain tf*idf values which are than used as values of the feature vector. Be aware that you have to compute these values before using this node. To do so i.e. the frequency calculation nodes can be used.
As collection cell
If checked all vector entries will be stored in a collection cell consisting of double cells. The cells are ordered, the ordering is specified in the data table spec. If not checked all double cells will be stored in corresponding columns. The advantage of the column representation is that most of the regular algorithms in KNIME can be applied. The disadvantage is (which is on the other hand the advantage of the collection representation) that processing of subsequent nodes will be slowed down, due to the many columns that will be created (dependent on the input data of course).

Input Ports

The input table containing the bag of words.

Output Ports

An output table containing the terms with the related term vectors.

Best Friends (Incoming)

Best Friends (Outgoing)

Update Site

To use this node in KNIME, install KNIME Textprocessing Plug-in from the following update site:

Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform.