PDB Sequence Extractor

The PDB Sequence Extractor node extracts all chain sequences from a PDB cell. A new row is added to the output table for each chain, and the chain ID is always added. The sequences can be enumerated in any of 4 ways:
  • ‘Raw’ 3-letter sequence(s) from the SEQRES records
  • ‘Sanitized’ 1-letter sequence(s) from the SEQRES records (This option should give identical results to those obtained from the PDB FASTA file download and FASTA Sequence Extractor node)
  • ‘Raw’ 3-letter sequence(s) from the co-ordinates block
  • ‘Sanitized’ 1-letter sequence(s) from the co-ordinates block
If co-ordinates sequences are extracted, then a Model ID column will also be included in the output. Optionally, HETATM records can be included in co-ordinates-derived the sequence(s). If no sequences are selected, then only a list of chains will be returned. The list of chains will consist of all chains found in SEQRES or Co-ordinate blocks (the latter respecting the Include HETATM option setting), regardless of which sequences are extracted.

'Sanitization' is as follows (which follows as closely as possible the process implemented by the PDB):

  • Phosphorylated, Sulfated, Acylated and Side-chain Methylated amino acids are converted to their unmodified parents
  • D-Amino acids are converted to their L-Amino acid counterparts
  • DNA residues (e.g. DA) are converted to the corresponding RNA residue (e.g. A)
For SEQRES residues, the mappings are taken from the MODRES record in the PDB file. For co-ordinate sequences, tha mappings are from a built-in dictionary, in case the MODRES record is incomplete. 'X' is used for non-deciphered residues, and '?' for sequence gaps in the co-ordinate sequences.

This node was developed by Vernalis (Cambridge, UK). For feedback and more information, please contact knime@vernalis.com

Options

Select a column containing the PDB Cells
The column containing the PDB Cells
Remove PDB Column
Whether the PDB cell column is to be removed from the output table
'Raw' 3-letter sequence(s) from SEQRES records
Extract the sequence in the unprocessed 3-letter form present in the SEQRES records
'Sanitized' 1-letter sequence(s) from SEQRES records
Extract the sequence in the sanitized form (see above) from the SEQRES records
'Raw' 3-letter sequence(s) from Co-ordinate records
Extract the sequence in the unprocessed 3-letter form from the co-ordinates block
'Sanitized' 1-letter sequence(s) from Co-ordinate records
Extract the sequence in the sanitized form (see above) 3-letter form from the co-ordinates block
Include HETATM Co-ordinate records in sequence(s)
Include the heterogen ('HETATM') records in the co-ordinates block

Input Ports

Icon
Input table containing a column of PDB Cells

Output Ports

Icon
Table with one or more sequence columns appended

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.