0 ×

Bowtie2

IBIS Helmholtz-Node extension for KNIME Workbench version 1.8.1.201707071203 by IBIS KNIME Team

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.
Source: http://bowtie-bio.sourceforge.net/bowtie2/

Options

Automatically select value for parameters according to available memory
Disable the default behavior whereby bowtie2-build automatically selects values for the 'The maximum number of suffixes allowed in a block', 'Use <int> as the period for the difference-cover sample' and 'Use a packed representation for DNA strings' parameters according to available memory.
Use a packed representation for DNA strings
Use a packed (2-bits-per-nucleotide) representation for DNA strings. This saves memory but makes indexing 2-3 times slower.
The maximum number of suffixes allowed in a block
Allowing more suffixes per block makes indexing faster, but increases peak memory usage.
Use <int> as the period for the difference-cover sample
A larger period yields less memory overhead, but may make suffix sorting slower, especially if repeats are present. Must be a power of 2 no greater than 4096.
Disable use of the difference-cover sample
Suffix sorting becomes quadratic-time in the worst case (where the worst case is an extremely repetitive reference).
Mark every 2^<int> rows.
To map alignments back to positions on the reference sequences, it's necessary to annotate ('mark') some or all of the Burrows-Wheeler rows with their corresponding location on the genome. It governs how many rows get marked: the indexer will mark every 2^<int> rows. Marking more rows makes reference-position lookups faster, but requires more memory to hold the annotations at runtime. The default is 5 (every 32nd row is marked; for human genome, annotations occupy about 340 megabytes).
Use the first <int> characters of the query to calculate an initial Burrows-Wheeler
The ftab is the lookup table used to calculate an initial Burrows-Wheeler range with respect to the first <int> characters of the query. A larger <int> yields a larger lookup table but faster query times. The ftab has size 4^(<int>+1) bytes.

Alignment Parameters

Input
Skip the first <int> reads/pairs in the input: Skip (i.e. do not align) the first <int> Reads or pairs in the input.
Stop after first <int> reads/pairs: Reads or read pairs from the input then stop.
Trim <int> bases from 5'/left end of reads: Trim <int> bases from 5' (left) end of each read before alignment.
Trim <int> bases from 3'/right end of reads: Trim <int> bases from 3' (right) end of each read before alignment.
Select quality score type:
  • Phred+33: Input qualities are ASCII chars equal to the Phred quality plus 33. This is also called the 'Phred+33' encoding, which is used by the very latest Illumina pipelines.
  • Phred+64: Input qualities are ASCII chars equal to the Phred quality plus 64. This is also called the "Phred+64" encoding.
  • Encoded as space-delimited integers: Quality values are represented in the read input file as space-separated ASCII integers, e.g., 40 40 30 40..., rather than ASCII characters, e.g., II?I.... Integers are treated as being on the Phred quality scale.
Presets
Use a set of preselected parameters: It is possible to use one of four preselected parameter sets: very-fast, fast, sensitive, very-sensitive. This option concerns the parameters: 'Max <int> in seed alignment', 'Length of seed substrings', 'Interval between seed substrings w/r/t read len', 'Give up extending after <int> failed extends in a row', 'For reads with repetitive seeds, try <int> sets of seeds'.
Alignment
Max <int> mismatches in seed alignment: Sets the number of mismatches to allowed in a seed alignment during multiseed alignment. Can be set to 0 or 1. Setting this higher makes alignment slower (often much slower) but increases sensitivity.
Length of seed substrings: Sets the length of the seed substrings to align during multiseed alignment. Smaller values make alignment slower but more sensitive.
Interval between seed substrings w/r/t read len: Sets a function governing the interval between seed substrings to use during multiseed alignment. Since it's best to use longer intervals for longer reads, this parameter sets the interval as a function of the read length, rather than a single one-size-fits-all number. For instance, specifying -i S,1,2.5 sets the interval function f to f(x) = 1 + 2.5 * sqrt(x), where x is the read length.
Func for max # non-A/C/G/Ts permitted in aln: Sets a function governing the maximum number of ambiguous characters (usually Ns and/or .s) allowed in a read as a function of read length. For instance, specifying -L,0,0.15 sets the N-ceiling function f to f(x) = 0 + 0.15 * x, where x is the read length.
Include <int> extra ref chars on sides of DP table: 'Pads' dynamic programming problems by <int> columns on either side to allow gaps.
Disallow gaps within <int> nucs of read extremes: Disallow gaps within <int> positions of the beginning or end of the read.
Treat all quality values as 30 on Phred scale: When calculating a mismatch penalty, always consider the quality value at the mismatched position to be the highest possible, regardless of the actual value. I.e. input is treated as though all quality values are high.
Do not align forward (original) version of read: Check this option, bowtie2 will not attempt to align unpaired reads to the forward (Watson) reference strand.
Do not align reverse-complement version of read: Check this option, bowtie2 will not attempt to align unpaired reads against the reverse-complement (Crick) reference strand.
Select an alignment type:
  • Entire read must align (no clipping): In this mode, Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or "soft clipping") of characters from either end.
  • local alignment (ends might be soft clipped): In this mode, Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted ("soft clipped") from the ends in order to achieve the greatest possible alignment score.

Further parameters

Scoring
Set the match bonus: Sets the match bonus. In 'local' mode <int> is added to the alignment score for each position where a read character aligns to a reference character and the characters match. Not used in 'end-to-end' mode.
Max penalty for mismatch: Sets the maximum (MX) and minimum (MN) mismatch penalties, both integers. A number less than or equal to MX and greater than or equal to MN is subtracted from the alignment score for each position where a read character aligns to a reference character, the characters do not match, and neither is an N.
Penalty for non-A/C/G/Ts in read/ref: Sets penalty for positions where the read, reference, or both, contain an ambiguous character such as N.
Read gap open/ extend penalty: Sets the read gap open (<int1>) and extend (<int2>) penalties. A read gap of length N gets a penalty of <int1> + N * <int2>.
Reference gap open/ extend penalty: Sets the reference gap open (<int1>) and extend (<int2>) penalties. A reference gap of length N gets a penalty of <int1> + N * <int2>.
Min acceptable alignment score w/r/t read length: Sets a function governing the minimum alignment score needed for an alignment to be considered 'valid' (i.e. good enough to report). This is a function of read length. For instance, specifying L,0,-0.6 sets the minimum-score function f to f(x) = 0 + -0.6 * x, where x is the read length.
Reporting
Choose an alignment type:
  • Look for multiple alignments, report best, with MAPQ: bowtie2 searches for distinct, valid alignments for each read. When it finds a valid alignment, it continues looking for alignments that are nearly as good or better. The best alignment found is reported (randomly selected from among best if tied). Information about the best alignments is used to estimate mapping quality and to set SAM optional fields.
  • Report up to <int> alns per read; MAPQ not meaningful: Instead, it searches for at most <int> distinct, valid alignments for each read. The search terminates when it can't find more distinct valid alignments, or when it finds <int>, whichever happens first. All alignments found are reported in descending order by alignment score. The alignment score for a paired-end alignment equals the sum of the alignment scores of the individual mates. Each reported read or pair alignment beyond the first has the SAM 'secondary' bit (which equals 256) set in its FLAGS field. For reads that have more than <int> distinct, valid alignments, bowtie2 does not gaurantee that the <int> alignments reported are the best possible in terms of alignment score.Note: Bowtie 2 is not designed with large values for this option in mind, and when aligning reads to long, repetitive genomes large <int> can be very, very slow.
  • Report all alignments; very slow, MAPQ not meaningful: Like 'Report up to <int> alns per read; MAPQ not meaningful' but with no upper limit on number of alignments to search for. Note: Bowtie 2 is not designed with this mode in mind, and when aligning reads to long, repetitive genomes this mode can be very, very slow.
Effort
Give up extending after <int> failed extends in a row: Up to <int> consecutive seed extension attempts can 'fail' before Bowtie 2 moves on, using the alignments found so far. A seed extension 'fails' if it does not yield a new best or a new second-best alignment.
For reads with repetitive seeds, try <int> sets of seeds: <int> is the maximum number of times Bowtie 2 will 're-seed' reads with repetitive seeds. When 're-seeding', Bowtie 2 simply chooses a new set of reads (same length, same number of mismatches allowed) at different offsets and searches for more alignments. A read is considered to have repetitive seeds if the total number of seed hits divided by the number of seeds that aligned at least once is greater than 300.
Paired-end
Minimum fragment length: The minimum fragment length for valid paired-end alignments. E.g. if 60 is specified and a paired-end alignment consists of two 20-bp alignments in the appropriate orientation with a 20-bp gap between them, that alignment is considered valid (as long as 'Maximum fragment length' is also satisfied). A 19-bp gap would not be valid in that case.
Maximum fragment length: The maximum fragment length for valid paired-end alignments. E.g. if 100 is specified and a paired-end alignment consists of two 20-bp alignments in the proper orientation with a 60-bp gap between them, that alignment is considered valid (as long as 'Minimum fragment length' is also satisfied). A 61-bp gap would not be valid in that case.
Select upstream/downstream mate orientations in the alignment: The upstream/downstream mate orientations for a valid paired-end alignment against the forward reference strand.
  • forward/reverse: If fr is specified and there is a candidate paired-end alignment where mate 1 appears upstream of the reverse complement of mate 2 and the fragment length constraints are met, that alignment is valid. Also, if mate 2 appears upstream of the reverse complement of mate 1 and all other constraints are met, that too is valid (appropriate for Illumina's Paired-end Sequencing Assay).
  • reverse/forward: rf likewise requires that an upstream mate 1 be reverse-complemented and a downstream mate2 be forward-oriented.
  • forward/forward: ff requires both an upstream mate 1 and a downstream mate 2 to be forward-oriented
Suppress unpaired alignments for paired reads: By default, when bowtie2 cannot find a concordant or discordant alignment for a pair, it then tries to find alignments for the individual mates. This option disables that behavior.
Suppress discordant alignments for paired reads: By default, bowtie2 looks for discordant alignments if it cannot find any concordant alignments. A discordant alignment is an alignment where both mates align uniquely, but that does not satisfy the paired-end constraints . This option disables that behavior.
Not concordant when mates extend past each other: If the mates 'dovetail', that is if one mate alignment extends past the beginning of the other such that the wrong mate begins upstream, consider that to be concordant.
Not concordant when one mate alignment contains other: If one mate alignment contains the other, consider that to be non-concordant.
Not concordant when mates overlap at all: If one mate alignment overlaps the other at all, consider that to be non-concordant.
Performance
Number of alignment threads to launch: Launch <int> parallel search threads. Threads will run on separate processors/cores and synchronize when parsing reads and outputting alignments. Searching for alignments is highly parallel, and speedup is close to linear. Increasing <int> increases Bowtie 2's memory footprint. E.g. when aligning to a human genome index, increasing <int> from 1 to 8 increases the memory footprint by a few hundred megabytes. Please choose the number of cores, that should be used parallel.
Force SAM output order to match order of input reads: Guarantees that output SAM records are printed in an order corresponding to the order of the reads in the original input file, even when 'Number of alignment threads to launch' is set greater than 1. Specifying this option and setting 'Number of alignment threads to launch' greater than 1 causes Bowtie 2 to run somewhat slower and use somewhat more memory then if this option was not specified.
Use memory-mapped I/O for index: Use memory-mapped I/O to load the index, rather than typical file I/O. Memory-mapping allows many concurrent bowtie processes on the same computer to share the same memory image of the index (i.e. you pay the memory overhead just once). This facilitates memory-efficient parallelization of bowtie in situations where using 'Number of alignment threads to launch' is not possible or not preferable.
Other
Filter out reads that are bad according to QSEQ filter

Preference page

HTE
Set threshold for repeated execution. Only used if HTE is enabled in the preference page.
Path to bowtie2
Set the path to bowtie executable.
Path to reference page
Set the path to a reference sequence.

Input Ports

Icon
Cell 0: Path to ReadFile1
Cell 1 (Optional): Path to ReadFile2 (if paired end mapping is used)

Output Ports

Icon
Cell 0: Path to SAM file

Views

STDOUT / STDERR
The node offers a direct view of its standard out and the standard error of the tool.

Best Friends (Outgoing)

Installation

To use this node in KNIME, install KNIME4NGS from the following update site:

KNIME 4.3

You don't know what to do with this link? Read our NodePit Product and Node Installation Guide that explains you in detail how to install nodes to your KNIME Analytics Platform.

Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform. Browse NodePit from within KNIME, install nodes with just one click and share your workflows with NodePit Space.

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.