One-Hot Encoder (Biological Sequences)

This component takes a column containing biological sequences (DNA/RNA/Protein) and creates a one-hot encoded version of the sequences. Through the components configuration, it's possible to select a fitting alphabet as well as the way to handle characters/letters that are not in the selected alphabet. The chosen input column is either replaced or a new column is appended based on the user input during configuration.

Options

Replace Input Column
Whether to replace the input sequence column with its one-hot encoded version or keep it.
Bit Vector Output
If set the result column will have a bit vector data type. Otherwise the column type will be a vector of doubles.
Select the biological sequence column to encode
Select a column to be one-hot encoded. The column must be of type string and its values must be biological (DNA, RNA or Protein) sequences.
Alphabet
Select the appropriate alphabet for the selected sequence.
Unknown Letter Handling (not applicable if alphabet is auto-detected)
Select a way to handle a letter that is not part of the selected alphabet.

Input Ports

Icon
A table with sequences to be one-hot encoded

Output Ports

Icon
A table with a one-hot encoded column

Nodes

Extensions

Links