The purpose of the variant recalibrator is to assign a well-calibrated probability to each variant call in a call set. You can then create highly accurate call sets by filtering based on this single estimate for the accuracy of each call. The approach taken by variant quality score recalibration is to develop a continuous, covarying estimate of the relationship between SNP/Indel call annotations (QD, MQ, HaplotypeScore, and ReadPosRankSum, for example) and the probability that a SNP/Indel is a true genetic variant versus a sequencing or data processing artifact. This model is determined adaptively based on known, truth and training reference data sets. This adaptive error model can then be applied to both known and novel variation discovered in the call set of interest to evaluate the probability that each call is real. The score that gets added to the INFO field of each variant is called the VQSLOD. It is the log odds ratio of being a true variant versus being false under the trained Gaussian mixture model. For further information, see the GATK documentation of the VariantRecalibrator and the ApplyRecalibration walker. Useful information about training sets/arguments can be found in this GATK article.
You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.
To use this node in KNIME, install the extension KNIME4NGS from the below update site following our NodePit Product and Node Installation Guide:
Do you have feedback, questions, comments about NodePit, want to support this platform, or want your own nodes or workflows listed here as well? Do you think, the search results could be improved or something is missing? Then please get in touch! Alternatively, you can send us an email to firstname.lastname@example.org, follow @NodePit on Twitter, or chat on Gitter!
Please note that this is only about NodePit. We do not provide general support for KNIME — please use the KNIME forums instead.