MergeTwoVCFs

The MergeTwoVCFs node is based on the GATK CombineVariants tool. It reads in variants records from two separate ROD (Reference-Ordered Data) sources and combines them into a single VCF. This tool aims to fulfill two main possible use cases:
1.) It combines variant records present at the same site in the different input sources into a single variant record in the output.
2.) It assumes that each ROD source represents the same set of samples (although this is not enforced). It uses the priority list (if provided) to emit a single record instance at every position represented in the input RODs. This node can for example merge the output VCLs file from two different variant calling tools (e.g. Pindel and GATKHaplotypeCaller).
For further information, see GATK documentation of CombineVariants.

Options

CombineVariants

Genotype Merge Type
Determine how genotype records for samples shared across the ROD file should be merged.
  • UNIQUIFY: Make all sample genotypes unique by file. Each sample shared across RODs gets named sample.ROD.
  • PRIORITIZE: Take the genotypes in priority order.
  • UNSORTED: Take the genotypes in any order.
  • REQUIRE_UNIQUE: Require that all samples/genotypes be unique between all inputs.
Prioritize input: Specify the merging priority regarding the choice of which record gets emitted when taking the union of variants that contain genotypes. The list must be passed as a comma-separated string listing the names of the variant input files. Use name tags (defined in the fields above) for best results. Input VCF file 1 and 2: Set the paths to the vcf files that should be merged.
Folder for output files
Set the path to the directory where the output files should be stored.
Filtered Record Merge Type
Determine how records seen at the same site in the VCF, but with different FILTER fields, should be handled.
  • KEEP_IF_ANY_UNFILTERED: Union - leaves the record if any record is unfiltered.
  • KEEP_IF_ALL_UNFILTERED: Requires all records present at site to be unfiltered. VCF files that don't contain the record don't influence this.
  • KEEP_UNCODITIONAL: If any record is present at this site (regardless of possibility being filtered), then all such records are kept and the filters are reset.

GATK

GATK Memory
Set the maximum Java heap size (in GB).
Path to BED file
You can check this option to perform the analysis in certain genomic regions. You have to specify the intervals in a text file in BED format and select the file in the file browser.
Further options
Set additional command line flags for the MergeTwoVCFs.

Preference page

HTE
Set a threshold for repeated execution. Only used if HTE is enabled in the preference page.
Path to reference sequence
Set the path to the reference reference sequence. This will be done automatically if the path is already defined in the preference page.
Path to GATK jar file
Set the path to GenomeAnalysisTK.jar. This will be done automatically if the path is already defined in the preference page.

Input Ports

Icon
Cell 0: Path to VCF files.
Icon
No description for this port available.

Output Ports

Icon
Cell 0: Path to merged VCF file.

Popular Predecessors

Popular Successors

  • No recommendations found

Views

STDOUT / STDERR
The node offers a direct view of its standard out and the standard error of the tool.

Workflows

  • No workflows found

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.