Load FASTA Files

Each FASTA is loaded and parsed such that one output row of the table contains the data for 1 sequence in the file. The node attempts to parse the header block according to the standard options supplied, as indicated:

  • GenBank >gi|{gi-number}|gb|{accession}|{locus}
  • EMBL Data Library >gi|{gi-number}|emb|{accession}|{locus}
  • DDBJ, DNA Database of Japan >gi|{gi-number}|dbj|{accession}|{locus}
  • NBRF PIR >pir||{entry}
  • Protein Research Foundation >prf||{name}
  • SWISS-PROT >sp|{accession}|{name} or >tr|{accession}|{name}
  • PDB >{PDB ID}:{chain}|PDBID|CHAIN|SEQUENCE
  • Patents >pat|{country}|{number}
  • GenInfo Backbone Id >bbs|{number}
  • General database identifier >gnl|{database}|{identifier}
  • NCBI Reference Sequence >ref|{accession}|{locus}
  • Local Sequence identifier >lcl|{identifier}
  • Other (No properties extracted)

This node was developed by Vernalis Research . For feedback and more information, please contact knime@vernalis.com

Options

Select files
Use the 'Browse...' and 'Add from history' buttons to add all the files to be included in the table. Alternatively, a flow variable can be specified, containing one or more filenames separated by ';'. The latest added file(s) will be selected. If no files are highlighted in the 'Selected files' box, then the 'Browse...' button opens a new file browser window in the default location; otherwise, the file browser opens in the last highlighted file's location.
Select file encoding
Select the file encoding. 'Guess' will attempt to assign it based on the connection property of the URL, the content-type, and the Byte-Order Mark (BOM). UTF-8 will be used if no other encoding is identified
Include paths in output table
Include the full file path and URLs as columns in the output table
Include filename in Row IDs
The filename will be included in the Row ID (duplicated will be suffixed with '_n', where n is an index starting at 0). Otherwise, the Row IDs will be in the format 'Row_n', with an an index starting at 0
Include filenames in output table
Include the filename as a column in the output table
Newline output
The newline character(s) to be used in the SDF Cell. 'System' will dynamically use the the newline of the system the node is executed on (the current value for this is shown in the dialog, but on another system, the local value will be used). 'Preserve incoming' will look in the first 65535 characters of the file for the first linebreak ('\r\n' or '\n') and use that.
FASTA Type
The header format

Input Ports

Icon
Optional flow variables containing file path(s)

Output Ports

Icon
Parsed content of the loaded files

Views

This node has no views

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.