FASTA Sequence Extractor

This node extracts the sequences for all chains listed in the FASTA file. For multi-chain FASTA files, a new row will be added for each chain. A number of columns will be added according to the source type selected in the drop-down as follows - properties extracted are shown as {property}:
  • GenBank >gi|{gi-number}|gb|{accession}|{locus}
  • EMBL Data Library >gi|{gi-number}|emb|{accession}|{locus}
  • DDBJ, DNA Database of Japan >gi|{gi-number}|dbj|{accession}|{locus}
  • NBRF PIR >pir||{entry}
  • Protein Research Foundation >prf||{name}
  • SWISS-PROT >sp|{accession}|{name}
  • PDB >pdb|{entry}|{chain} or >{entry}:{chain}|PDBID|CHAIN|SEQUENCE
  • Patents >pat|{country}|{number}
  • GenInfo Backbone Id >bbs|{number}
  • General database identifier >gnl|{database}|{identifier}
  • NCBI Reference Sequence >ref|{accession}|{locus}
  • Local Sequence identifier >lcl|{identifier}
  • Other (No properties extracted)
FASTA Files can be retrieved for PDB entries using the PDB Downloader nodes. NOTE: No checking of the FASTA header format is implemented, so selecting the wrong format may give unpredicatable results, although the node should still execute in these circumstances. No sequence parsing is implemented, and the processing is type-agnostic (protein, nucleotide etc)

This node was developed by Vernalis (Cambridge, UK). For feedback and more information, please contact knime@vernalis.com

Options

Select a column containing the FASTA Sequence Cells
Select the string column containing the FASTA format files
Delete FASTA Sequence column
The FASTA Format column is deleted from the output table if this option is selected
Select FASTA Sequence source or type
Select the FASTA Format of choice. See above for options. Please contact us if you would like other formats added
Extract complete header
If this option is selected, then the complete header is extracted as a separate column. This option can be used for further downstream parsing of unsupported header types.
Extract sequence
The sequence is extracted into a new string column

Input Ports

Icon
Input table containing a column of FASTA sequence files downloaded from the RCSB PDB

Output Ports

Icon
Output table with the chains and sequences extracted into separate columns according to the options specifid

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.