FASTA Sequence Extractor

StreamableVernalis custom KNIME nodes package version 1.27.0.v202008190633 by Vernalis (R&D), UK

This node extracts the sequences for all chains listed in the FASTA file. For multi-chain FASTA files, a new row will be added for each chain. A number of columns will be added according to the source type selected in the drop-down as follows - properties extracted are shown as {property}:
  • GenBank >gi|{gi-number}|gb|{accession}|{locus}
  • EMBL Data Library >gi|{gi-number}|emb|{accession}|{locus}
  • DDBJ, DNA Database of Japan >gi|{gi-number}|dbj|{accession}|{locus}
  • NBRF PIR >pir||{entry}
  • Protein Research Foundation >prf||{name}
  • SWISS-PROT >sp|{accession}|{name}
  • PDB >pdb|{entry}|{chain} or >{entry}:{chain}|PDBID|CHAIN|SEQUENCE
  • Patents >pat|{country}|{number}
  • GenInfo Backbone Id >bbs|{number}
  • General database identifier >gnl|{database}|{identifier}
  • NCBI Reference Sequence >ref|{accession}|{locus}
  • Local Sequence identifier >lcl|{identifier}
  • Other (No properties extracted)
FASTA Files can be retrieved for PDB entries using the PDB Downloader nodes. NOTE: No checking of the FASTA header format is implemented, so selecting the wrong format may give unpredicatable results, although the node should still execute in these circumstances. No sequence parsing is implemented, and the processing is type-agnostic (protein, nucleotide etc)

This node was developed by Vernalis (Cambridge, UK). For feedback and more information, please contact knime@vernalis.com


Select a column containing the FASTA Sequence Cells
Select the string column containing the FASTA format files
Delete FASTA Sequence column
The FASTA Format column is deleted from the output table if this option is selected
Select FASTA Sequence source or type
Select the FASTA Format of choice. See above for options. Please contact us if you would like other formats added
Extract complete header
If this option is selected, then the complete header is extracted as a separate column. This option can be used for further downstream parsing of unsupported header types.
Extract sequence
The sequence is extracted into a new string column

Input Ports

Input table containing a column of FASTA sequence files downloaded from the RCSB PDB

Output Ports

Output table with the chains and sequences extracted into separate columns according to the options specifid

To use this node in KNIME, install Vernalis KNIME Nodes from the following update site:

