0 ×

MergeTwoVCFs

IBIS Helmholtz-Node extension for KNIME Workbench version 1.8.1.201707071203 by IBIS KNIME Team

The MergeTwoVCFs node is based on the GATK CombineVariants tool. It reads in variants records from two separate ROD (Reference-Ordered Data) sources and combines them into a single VCF. This tool aims to fulfill two main possible use cases:
1.) It combines variant records present at the same site in the different input sources into a single variant record in the output.
2.) It assumes that each ROD source represents the same set of samples (although this is not enforced). It uses the priority list (if provided) to emit a single record instance at every position represented in the input RODs. This node can for example merge the output VCLs file from two different variant calling tools (e.g. Pindel and GATKHaplotypeCaller).
For further information, see GATK documentation of CombineVariants.

Options

CombineVariants

Genotype Merge Type
Determine how genotype records for samples shared across the ROD file should be merged.
  • UNIQUIFY: Make all sample genotypes unique by file. Each sample shared across RODs gets named sample.ROD.
  • PRIORITIZE: Take the genotypes in priority order.
  • UNSORTED: Take the genotypes in any order.
  • REQUIRE_UNIQUE: Require that all samples/genotypes be unique between all inputs.
Prioritize input: Specify the merging priority regarding the choice of which record gets emitted when taking the union of variants that contain genotypes. The list must be passed as a comma-separated string listing the names of the variant input files. Use name tags (defined in the fields above) for best results. Input VCF file 1 and 2: Set the paths to the vcf files that should be merged.
Folder for output files
Set the path to the directory where the output files should be stored.
Filtered Record Merge Type
Determine how records seen at the same site in the VCF, but with different FILTER fields, should be handled.
  • KEEP_IF_ANY_UNFILTERED: Union - leaves the record if any record is unfiltered.
  • KEEP_IF_ALL_UNFILTERED: Requires all records present at site to be unfiltered. VCF files that don't contain the record don't influence this.
  • KEEP_UNCODITIONAL: If any record is present at this site (regardless of possibility being filtered), then all such records are kept and the filters are reset.

GATK

GATK Memory
Set the maximum Java heap size (in GB).
Path to BED file
You can check this option to perform the analysis in certain genomic regions. You have to specify the intervals in a text file in BED format and select the file in the file browser.
Further options
Set additional command line flags for the MergeTwoVCFs.

Preference page

HTE
Set a threshold for repeated execution. Only used if HTE is enabled in the preference page.
Path to reference sequence
Set the path to the reference reference sequence. This will be done automatically if the path is already defined in the preference page.
Path to GATK jar file
Set the path to GenomeAnalysisTK.jar. This will be done automatically if the path is already defined in the preference page.

Input Ports

Icon
Cell 0: Path to VCF files.
Icon
No description for this port available.

Output Ports

Icon
Cell 0: Path to merged VCF file.

Views

STDOUT / STDERR
The node offers a direct view of its standard out and the standard error of the tool.

Best Friends (Incoming)

Installation

To use this node in KNIME, install KNIME4NGS from the following update site:

KNIME 4.3

You don't know what to do with this link? Read our NodePit Product and Node Installation Guide that explains you in detail how to install nodes to your KNIME Analytics Platform.

Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform. Browse NodePit from within KNIME, install nodes with just one click and share your workflows with NodePit Space.

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.