0 ×

AdapterRemovalAdv

DeprecatedNGS related nodes for KNIME Workbench version 0.2.200.qualifier by Bernd Jagla, Institute Pasteur

Adapter removal / data cleansing

This node finds an "adapter sequence" within high-through-put sequencing data and removes them. It leaves fragments and information about what caused the removal.

Sequences are compared using simple character based comparison of two sequences, the adpater/query sequence and the target sequence. The query sequence is sliding along the target sequence using the sliding window approach with a step size of 1, starting both from the 5 prime end of the sequences.

If a match has been detected the corresponding part of the target sequence and anything following this sequence on the target sequence will be removed.

A match is in effect, if the fraction of matching base-pairs over the number compared base-pairs is larger than the similarity threshold (see options).

Two bases match if they are the same or if the corresponding quality score for the target base is above the quality threshold (ASCII values are compared).

If the part of the target sequence that is being compared using the sliding window approach is smaller than the query sequence only the overlapping part will be compared and the number of compared bases is therefore also smaller than the length of the query sequence.

If the part of the target sequence that is being compared using the sliding window approach is shorter than "minimum overlap" the sequences don't match.

If the value for "partial comparisons" is greater than "0" (null) the query sequence is clipped from the 5 prime end until "minimum overlap" length is reached. Each such sequences is treated as an independent adapter sequence.

Options

Sequence Column
column from first table where the sequencing reads are stored
Quality string column
column from first table where the quality strings are stored
Adapter sequence column
column from second table where the adapter sequence is stored
similarity threshold
column from second table where the similarity threshold is stored
quality threshold
column from second table where the quality threshold is stored
minimum overlap
column from second table where the minimum overlap is stored
partial comparisons
column from second table where the partial comparisons flag (0/1) is stored

Input Ports

Icon
alignment data with sequence and quality strings
Icon
Adapter sequences and corresponding parameters

Output Ports

Icon
cleaned data

Best Friends (Incoming)

Installation

To use this node in KNIME, install KNIME NGS tools from the following update site:

KNIME 4.3

A zipped version of the software site can be downloaded here.

You don't know what to do with this link? Read our NodePit Product and Node Installation Guide that explains you in detail how to install nodes to your KNIME Analytics Platform.

Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform. Browse NodePit from within KNIME, install nodes with just one click and share your workflows with NodePit Space.

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.