The
PDB Sequence Extractor node extracts all chain sequences from a PDB cell.
A new row is added to the output table for each chain, and the chain ID is always added.
The sequences can be enumerated in any of 4 ways:
- ‘Raw’ 3-letter sequence(s) from the SEQRES records
- ‘Sanitized’ 1-letter sequence(s) from the SEQRES records (This option should give
identical results to those obtained from the PDB FASTA file download and FASTA Sequence Extractor node)
- ‘Raw’ 3-letter sequence(s) from the co-ordinates block
- ‘Sanitized’ 1-letter sequence(s) from the co-ordinates block
If co-ordinates sequences are extracted, then a Model ID column will also be included in the output.
Optionally, HETATM records can be included in co-ordinates-derived the sequence(s).
If no sequences are selected, then only a list of chains will be returned. The list of chains
will consist of all chains found in SEQRES or Co-ordinate blocks (the latter respecting the
Include HETATM option setting),
regardless of which sequences are extracted.
'Sanitization' is as follows (which follows as closely as possible the process
implemented by the PDB):
- Phosphorylated, Sulfated, Acylated and Side-chain Methylated amino acids are converted
to their unmodified parents
- D-Amino acids are converted to their L-Amino acid counterparts
- DNA residues (e.g. DA) are converted to the corresponding RNA residue (e.g. A)
For SEQRES residues, the mappings are taken from the MODRES record in the PDB file. For co-ordinate
sequences, tha mappings are from a built-in dictionary, in case the MODRES record is incomplete. 'X' is used for
non-deciphered residues, and '?' for sequence gaps in the co-ordinate sequences.
This node was developed by Vernalis (Cambridge, UK).
For feedback and more information, please contact knime@vernalis.com