BERT Embedder

Maps String or Document columns to a numerical vector using the provided BERT model. The node accepts non-fine-tuned BERT models (magenta port) or fine-tuned BERT models (grey port) and utilizes them for calculation of the embeddings of the texts. Embeddings are the numerical vector representation of the texts that can be used for visualization, clustering, classification, etc.

Options

Settings

Sentence column
The column with texts that will be vectorized.
Two-sentence mode
The mode for cases when input text consists of 2 distinct parts ("sentences").
Second sentence column
The column with the second sentence for the Two-sentence mode.
Max sequence length
The maximum length of a sequence after tokenization, limit is 512.

Advanced

Batch size
The size of a chunk of the input data to process.
Include sequence embeddings
Include individual word embeddings in addition to the whole text embeddings.

Python

Python
Select one of the Python execution environment options:
  • use default Python environment for the Redfield BERT Nodes (can be configured on the preference page)
  • use Conda environment from a Conda flow variable (only selectable if such a flow variable is available)

Input Ports

Icon
BERT Model
Icon
Data Table

Output Ports

Icon
Table with the computed embeddings

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.