GATKBaseRecalibration

This is a wrapper node for AnalyzeCovariates, BaseRecalibrator and PrintReads of the Genome Analysis Toolkit (GATK). This node addresses the problem of systematic errors in the base quality score emitted by sequencing machines. As these base qualities are used by many variant calling tools removing the bias leads to more accurate variant calls. The process of recalibration consists of 3 steps.
Step 1: A machine learning device is trained to build a model of covariation which is generated from the actual data and from known sites of genetic variation. (walkers: BaseRecalibrator)
Step 2: This optional step builds a second model and compares it to the first one. The comparison allows to generate before/after plots of the quality values. (walkers: BaseRecalibrator + AnalyzeCovariates).
Step 3: Finally, the model is applied to the alignment data and the base qualities are adapted to the biases found. (walkers: PrintReads)
For further information, see the GATK documentation of the BaseRecalibrator, the AnalyzeCovariates and the PrintReads walkers.

Options

Sets of known polymorphisms: You have to provide the node with at least one of the three named sets: Indels from 1000 Genomes project, indels from Mills and 1000 Genomes project, variants from dbSNP. BaseRecalibrator needs the sets for training its model.
Interval for recalibration: You can check this option to perform recalibration in certain genomic regions. You have to specify the intervals in a text file in BED format and select the file in the file browser.
Analyze Covariates: Before/after plots of the base quality score can be generated.
Optional flags: Set additional command line flags for the AnalyzeCovariates walker.
Print Reads: Specify whether to remove all additional information from the output BAM file except of the read group tag. The option reduces the output file size.
Optional flags: Set additional command line flags for the PrintReads walker.
General options: Number of CPU threads: Increasing the number of threads speeds up the node, but it also increases the memory required for the calculations. The BaseRecalibrator and the PrintReads walker run in multi-threaded mode.
Shared Java Memory: Set the maximum Java heap size shared by all CPU threads.

BaseRecalibrator

Cycle threshold: Set the maximum cycle value permitted for the Cycle covariate. (Default value = 500)
The cycle covariate will generate an error if it encounters a cycle greater than this value. This argument is ignored if the Cycle covariate is not used.
Gap open penalty: Gap open penalty for calculating BAQ (par-base alignment quality, probability that a base is not correctly aligned). Default value is 40. 30 is perhaps better for whole genome call sets.
Default quality for deletions: Set the default quality to use as a prior (reported quality) in the base deletion covariate model. (Default value = 45)
This value will replace all base qualities in the read for this default value. A Negative value turns it off.
Default quality for insertions: Set the default base quality to use as a prior (reported quality) in the base insertion covariate model. (Default value = 45)
This parameter is used for all reads without insertion quality scores for each base. [default is on] Setting this value to -1 disables the option.
Default quality for mismatches: Set the default quality to use as a prior (reported quality) in the base mismatch covariate model. (Default value = -1)
This value will replace all base qualities in the read for this default value. A negative value turns it off.
k-mer context size for indels: Define the size of the k-mer context to be used for base insertions and deletions. (Default value = 3)
The context covariate will use a context of this size to calculate its covariate value for base insertions and deletions. The value must be between 1 and 13 (inclusive). Note that higher values will increase runtime and required java heap size.
k-mer context size for mismatches: Set the size of the k-mer context to be used for base mismatches. (Default value = 2)
The context covariate will use a context of this size to calculate its covariate value for base mismatches. The value must be between 1 and 13 (inclusive). Note that higher values will increase runtime and required java heap size.
Quality threshold for read tails: Define the minimum quality for tha bases in the tail of the reads to be considered. (Default value = 2)
Reads with low quality bases on either tail (beginning or end) will not be considered in the context. This parameter defines the quality below which (inclusive) a tail is considered low quality
Optional flags: Set additional command line flags for the BaseRecalibrator walker.

Preference page

HTE: Set threshold for repeated execution. Only used if HTE is enabled in the preference page.
Path to GATK jar file: Set the path to the GenomeAnalysisTK.jar. This will be done automatically if the path is already defined in the preference page.
Path to reference genome: Set the path to the reference genome.
Path to 1000G Indels: Set the path to the 1000G project indels data set.
Path to Mills: Set the path to the Mills and 1000G reference data set.
Path to dbSNP: Set the path to the dbSNP reference data set.

Input Ports

: Cell 0: Path to input BAM file

Output Ports

: Cell 0: Path to recalibrated BAM file

Popular Predecessors

GATKRealignment100 %

Popular Successors

Views

STDOUT / STDERR: The node offers a direct view of its standard out and the standard error of the tool.

Workflows

KNIME4NGS_Test_VarCallingKnime4NGS

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension KNIME4NGS from the below update site following our NodePit Product and Node Installation Guide:

v5.6

Plugin provider: IBIS KNIME Team

Plugin version: 1.8.1.201707071203

On NodePit since: 2025-08-15

Last update: 2025-08-17

KNIME versions: Since v3.6

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!