PDB Sequence Extractor

The PDB Sequence Extractor node extracts all chain sequences from a PDB cell. A new row is added to the output table for each chain, and the chain ID is always added. The sequences can be enumerated in any of 4 ways:

‘Raw’ 3-letter sequence(s) from the SEQRES records
‘Sanitized’ 1-letter sequence(s) from the SEQRES records (This option should give identical results to those obtained from the PDB FASTA file download and FASTA Sequence Extractor node)
‘Raw’ 3-letter sequence(s) from the co-ordinates block
‘Sanitized’ 1-letter sequence(s) from the co-ordinates block

If co-ordinates sequences are extracted, then a Model ID column will also be included in the output. Optionally, HETATM records can be included in co-ordinates-derived the sequence(s). If no sequences are selected, then only a list of chains will be returned. The list of chains will consist of all chains found in SEQRES or Co-ordinate blocks (the latter respecting the Include HETATM option setting), regardless of which sequences are extracted.

'Sanitization' is as follows (which follows as closely as possible the process implemented by the PDB):

Phosphorylated, Sulfated, Acylated and Side-chain Methylated amino acids are converted to their unmodified parents
D-Amino acids are converted to their L-Amino acid counterparts
DNA residues (e.g. DA) are converted to the corresponding RNA residue (e.g. A)

For SEQRES residues, the mappings are taken from the MODRES record in the PDB file. For co-ordinate sequences, tha mappings are from a built-in dictionary, in case the MODRES record is incomplete. 'X' is used for non-deciphered residues, and '?' for sequence gaps in the co-ordinate sequences.

This node was developed by Vernalis (Cambridge, UK). For feedback and more information, please contact knime@vernalis.com

Options

Select a column containing the PDB Cells: The column containing the PDB Cells
Remove PDB Column: Whether the PDB cell column is to be removed from the output table
'Raw' 3-letter sequence(s) from SEQRES records: Extract the sequence in the unprocessed 3-letter form present in the SEQRES records
'Sanitized' 1-letter sequence(s) from SEQRES records: Extract the sequence in the sanitized form (see above) from the SEQRES records
'Raw' 3-letter sequence(s) from Co-ordinate records: Extract the sequence in the unprocessed 3-letter form from the co-ordinates block
'Sanitized' 1-letter sequence(s) from Co-ordinate records: Extract the sequence in the sanitized form (see above) 3-letter form from the co-ordinates block
Include HETATM Co-ordinate records in sequence(s): Include the heterogen ('HETATM') records in the co-ordinates block

Input Ports

: Input table containing a column of PDB Cells

Output Ports

: Table with one or more sequence columns appended

Popular Predecessors

Popular Successors

Views

This node has no views

Workflows

01_PDB_Query_Download_and_Save_LocallyKNIME Hub

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension Vernalis KNIME Nodes from the below update site following our NodePit Product and Node Installation Guide:

v5.10

Plugin provider: Vernalis Research, UK

Plugin version: 1.38.2.v202512021636

On NodePit since: 2026-02-18

Last update: 2026-03-04

Tags: Streamable

KNIME versions: v5.10, v5.9, v5.8, v5.7, v5.6, v5.5, v5.4, v5.3, v5.2, v5.1, v4.7, v4.6, v4.5, v4.4, v4.3, v4.2, v4.1, v4.0, v3.7, v3.6

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!