VQSR

The purpose of the variant recalibrator is to assign a well-calibrated probability to each variant call in a call set. You can then create highly accurate call sets by filtering based on this single estimate for the accuracy of each call. The approach taken by variant quality score recalibration is to develop a continuous, covarying estimate of the relationship between SNP/Indel call annotations (QD, MQ, HaplotypeScore, and ReadPosRankSum, for example) and the probability that a SNP/Indel is a true genetic variant versus a sequencing or data processing artifact. This model is determined adaptively based on known, truth and training reference data sets. This adaptive error model can then be applied to both known and novel variation discovered in the call set of interest to evaluate the probability that each call is real. The score that gets added to the INFO field of each variant is called the VQSLOD. It is the log odds ratio of being a true variant versus being false under the trained Gaussian mixture model. For further information, see the GATK documentation of the VariantRecalibrator and the ApplyRecalibration walker. Useful information about training sets/arguments can be found in this GATK article.

Options

General Options
Recalibration Mode: Specify which recalibration mode (SNP/indel) should be employed.
Java Memory in GB: Set the maximum Java heap size (in GB) per thread.
Variant Recalibration
Tranche levels: The levels of novel false discovery rate (FDR, implied by ti/tv) at which to slice the data (in percent, that is 1.0 for 1 percent). By default, the values of the Best Practice Guidelines are used according to the chosen mode (SNP or INDEL).
Annotation: Define which annotations should be used for calculations.
  • DP (Depth of Coverage) - should not be used when working with exome datasets.
  • InbreedingCoeff - is a population level statistic that requires at least 10 samples in order to be computed. For projects with fewer samples, or that includes many closely related samples (such as a family) please omit this annotation from the annotation field.
Gaussians: This parameter determines the maximum number of Gaussians that should be used when building a positive model using the variational Bayes algorithm. (Defualt value = 8)
Threads: Set the number of threads to be used. Increasing the number of threads speeds up the node, but also increases the memory required for the calculations.
Optional flags: Set additional command line flags for the VariantRecalibrator.
Resources
Select which resource datasets should be used for the VariantRecalibrator. The variant quality score recalibrator builds an adaptive error model using known variant sites and then applies this model to estimate the probability that each variant is a true genetic variant or a machine artifact All filtering criteria are learned from the data itself.

Resources for SNPs:
  • HapMap: True sites training resource
  • Omim: True sites training resource
  • 1000G: Non-true sites trainig resource
  • dbSNPS: Known, sites resource, not used in training
Example: resource:hapmap,known=false,training=true,truth=true,prior=15.0

Resources for Indels:
  • Mills: Known and true sites training resources.
Example: resource:mills,known=false,training=true,truth=false,prior=2.0
ApplyRecalibration
TS Filter Level: Set the truth sensitivity level at which ApplyRecalibration starts to filter.
Optional flags: Set additional command line flags for the ApplyRecalibration.

Preference page

HTE
Set threshold for repeated execution. Only used if HTE is enabled in the preference page.
Path to GATK jar file
Set the path to the GenomeAnalysisTK.jar. This will be done automatically if the path is already defined in the preference page.
Path to reference sequence
Set the path to the reference sequence. This will be done automatically if the path is already defined in the preference page.
Path to ...
Set the paths to reference data sets used for variant recalibration. This will be done automatically if the path is already defined in the preference page.
  • HapMap
  • Omni
  • 1000G SNPs
  • dbSNP
  • Mills

Input Ports

Icon
Cell 0: Path to input VCF file

Output Ports

Icon
Cell 0: Path to VQSR variants file

Views

STDOUT / STDERR
The node offers a direct view of its standard out and the standard error of the tool.

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.