VQSR

The purpose of the variant recalibrator is to assign a well-calibrated probability to each variant call in a call set. You can then create highly accurate call sets by filtering based on this single estimate for the accuracy of each call. The approach taken by variant quality score recalibration is to develop a continuous, covarying estimate of the relationship between SNP/Indel call annotations (QD, MQ, HaplotypeScore, and ReadPosRankSum, for example) and the probability that a SNP/Indel is a true genetic variant versus a sequencing or data processing artifact. This model is determined adaptively based on known, truth and training reference data sets. This adaptive error model can then be applied to both known and novel variation discovered in the call set of interest to evaluate the probability that each call is real. The score that gets added to the INFO field of each variant is called the VQSLOD. It is the log odds ratio of being a true variant versus being false under the trained Gaussian mixture model. For further information, see the GATK documentation of the VariantRecalibrator and the ApplyRecalibration walker. Useful information about training sets/arguments can be found in this GATK article.

Options

General Options

Recalibration Mode: Specify which recalibration mode (SNP/indel) should be employed.
Java Memory in GB: Set the maximum Java heap size (in GB) per thread.

Variant Recalibration

Tranche levels: The levels of novel false discovery rate (FDR, implied by ti/tv) at which to slice the data (in percent, that is 1.0 for 1 percent). By default, the values of the Best Practice Guidelines are used according to the chosen mode (SNP or INDEL).
Annotation: Define which annotations should be used for calculations.

DP (Depth of Coverage) - should not be used when working with exome datasets.
InbreedingCoeff - is a population level statistic that requires at least 10 samples in order to be computed. For projects with fewer samples, or that includes many closely related samples (such as a family) please omit this annotation from the annotation field.

Gaussians: This parameter determines the maximum number of Gaussians that should be used when building a positive model using the variational Bayes algorithm. (Defualt value = 8)
Threads: Set the number of threads to be used. Increasing the number of threads speeds up the node, but also increases the memory required for the calculations.
Optional flags: Set additional command line flags for the VariantRecalibrator.

Resources

Select which resource datasets should be used for the VariantRecalibrator. The variant quality score recalibrator builds an adaptive error model using known variant sites and then applies this model to estimate the probability that each variant is a true genetic variant or a machine artifact All filtering criteria are learned from the data itself.

Resources for SNPs:

HapMap: True sites training resource
Omim: True sites training resource
1000G: Non-true sites trainig resource
dbSNPS: Known, sites resource, not used in training

Example: resource:hapmap,known=false,training=true,truth=true,prior=15.0

Resources for Indels:

Mills: Known and true sites training resources.

Example: resource:mills,known=false,training=true,truth=false,prior=2.0

ApplyRecalibration

TS Filter Level: Set the truth sensitivity level at which ApplyRecalibration starts to filter.
Optional flags: Set additional command line flags for the ApplyRecalibration.

Preference page

HTE

Set threshold for repeated execution. Only used if HTE is enabled in the preference page.

Path to GATK jar file

Set the path to the GenomeAnalysisTK.jar. This will be done automatically if the path is already defined in the preference page.

Path to reference sequence

Set the path to the reference sequence. This will be done automatically if the path is already defined in the preference page.

Path to ...

Set the paths to reference data sets used for variant recalibration. This will be done automatically if the path is already defined in the preference page.

HapMap
Omni
1000G SNPs
dbSNP
Mills

Input Ports

: Cell 0: Path to input VCF file

Output Ports

: Cell 0: Path to VQSR variants file

Popular Predecessors

Popular Successors

Views

STDOUT / STDERR: The node offers a direct view of its standard out and the standard error of the tool.

Workflows

No workflows found

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.

Installation

To use this node in KNIME, install the extension KNIME4NGS from the below update site following our NodePit Product and Node Installation Guide:

v5.6

Plugin provider: IBIS KNIME Team

Plugin version: 1.8.1.201707071203

On NodePit since: 2025-08-15

Last update: 2025-08-18

KNIME versions: Since v3.6

Deploy, schedule, execute, and monitor your KNIME workflows locally, in the cloud or on-premises – with our brand new NodePit Runner.

Try NodePit Runner!