AdapterRemovalAdv

This Node Is Deprecated — This node is kept for backwards-compatibility, but the usage in new workflows is no longer recommended. The documentation below might contain more information.

Adapter removal / data cleansing

This node finds an "adapter sequence" within high-through-put sequencing data and removes them. It leaves fragments and information about what caused the removal.

Sequences are compared using simple character based comparison of two sequences, the adpater/query sequence and the target sequence. The query sequence is sliding along the target sequence using the sliding window approach with a step size of 1, starting both from the 5 prime end of the sequences.

If a match has been detected the corresponding part of the target sequence and anything following this sequence on the target sequence will be removed.

A match is in effect, if the fraction of matching base-pairs over the number compared base-pairs is larger than the similarity threshold (see options).

Two bases match if they are the same or if the corresponding quality score for the target base is above the quality threshold (ASCII values are compared).

If the part of the target sequence that is being compared using the sliding window approach is smaller than the query sequence only the overlapping part will be compared and the number of compared bases is therefore also smaller than the length of the query sequence.

If the part of the target sequence that is being compared using the sliding window approach is shorter than "minimum overlap" the sequences don't match.

If the value for "partial comparisons" is greater than "0" (null) the query sequence is clipped from the 5 prime end until "minimum overlap" length is reached. Each such sequences is treated as an independent adapter sequence.

Options

Sequence Column
column from first table where the sequencing reads are stored
Quality string column
column from first table where the quality strings are stored
Adapter sequence column
column from second table where the adapter sequence is stored
similarity threshold
column from second table where the similarity threshold is stored
quality threshold
column from second table where the quality threshold is stored
minimum overlap
column from second table where the minimum overlap is stored
partial comparisons
column from second table where the partial comparisons flag (0/1) is stored

Input Ports

Icon
alignment data with sequence and quality strings
Icon
Adapter sequences and corresponding parameters

Output Ports

Icon
cleaned data

Popular Predecessors

Popular Successors

  • No recommendations found

Views

This node has no views

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.