0 ×

Pindel

IBIS Helmholtz-Node extension for KNIME Workbench version 1.8.1.201707071203 by IBIS KNIME Team

This is a wrapper node for Pindel (designed for version 0.2.4) and the Pindel2VCF converter. Pindel is a tool for identifying structural variants in paired-end Illumina reads. It finds large deletions, medium-sized insertions, inversions and tandem duplications. Currently, this node only focuses on the deletions and short insertions called by Pindel and it is compatible with mappings produced by BWA or MOSAIK only. As Pindel has its own output format, this node includes also the Pindel2VCF script. It converts the files produced by Pindel to VCF format (Format explained at http://www.1000genomes.org/wiki/analysis/variant-call-format/VCF-variant-call-format-version-42 ).
Further information about Pindel and installation instructions are available at http://gmt.genome.wustl.edu/packages/pindel/

Options

Path to Pindel executable
Set the path to the Pindel executable.
Interval for variant calling
Tick this option if you only want to call variants in a certain genomic region, i.e. for one chromosome. If you choose this option you can enter the chromosome name and the interval coordinates. Note that the chromosome name has to match the reference sequence and the header of the input BAM file.
Path to Pindel config file
Pindel requires a tab-separated file containing the path to the BAM file (the BAM file has to match the BAM file from the inport), the average insert size and the sample name. More information about the Pindel config file format can be found at http://http://gmt.genome.wustl.edu/packages/pindel/
If the previous node is PicardTools CollectInsertSizeMetrics you can tick the option Generate config file to generate automatically this file. This option uses the information about the average insert size provided by the previous node.
Output
If you choose the option Convert Pindel output to VCF format (required for further analysis using for example VAT) you have to select the path to the Pindel2VCF script. Note that this node only converts the Pindel files for small insertions and deletions.
Runtime and memory
Increasing the number of threads reduces runtime. Increasing the amount of reference sequence loaded into RAM, reduces runtime but increases memory usage.

Pindel Parameters

Minimum number of matching bases
Pindel considers reads as evidence for a variant if they map correctly with more than the specified number of bases.
Mismatch threshold
Pindel does not align a part of read if there is another mapping position with less than the chosen number of base mismatches. Increasing this threshold increases accuracy but reduces sensitivity.
Number of perfect matches at breakpoints
For considering a breakpoint Pindel requires the selected number of perfectly matching bases around a breakpoint of a split read.
Sequencing error rate
Set the expected rate of sequencing errors.
Maximum allowed mismatch reads
Pindel considers reads as evidence for a variant if the proportion of mismatching bases is below this threshold.

Pindel2VCF Parameters

Reference sequence
Enter name and date of the reference sequence. If you do not know them you can check the option Use file name as reference name and Use current date .
Minimum number of reads to report genotype
This option defines the minimum coverage for reporting the variant. You should adapt this value to the overall sequencing coverage. Increasing this value reduces the number of false positive variant calls but can also remove true variants.
Proportion of reads defined as heterozygous
This threshold value refers to the number of reads supporting the variant compared to the overall number of reads at this site. All variants above this and below the homozygosity threshold are considered as heterozygous. Genotype 0/0 is assigned to all variants below this threshold.
Proportion of reads defined as homozygous
This threshold value refers to the number of reads supporting the variant compared to the overall number of reads at this site. All variants above this threshold are regarded as homozygous.
Output GATK-compatible genotypes
It is recommended to check this option if you want to use the VCF files in further analysis. The flag changes the format of the genotype tag.
Only output variants that are supported by reads on both strands
Tick this option to avoid strand-biased variant calls and reduce putative false positive variants. If this option is selected, Pindel2VCF just outputs variants that are supported by at least one read on the forward strand and one read on the reverse strand.
Minimum number of supporting reads
Choose the minimum number of reads supporting a variant for writing the variant to the output file. This is another possibility to remove putative false positives.
Minimum/ Maximum size of the variant
Define the minimal and maximal length of the variants that should be written to the output file. If you use the default settings no variant is excluded from the output because of its size.

Input Ports

Icon
Cell 0: Path2BAMFile (indexed BAM file for variant calling)
Cell 1 (optional): Path2ISMetrics (file produced by PicardTools CollectInsertSize with information about the insert size distribution)
The cell position does not matter. Additional columns are ignored.

Output Ports

Icon
Outport depends on the output options.
Outport with VCF output:
Cell 0: Path2VCFdeletionsFile (VCF file containing all deletions)
Cell 1: Path2VCFinsertionsFile (VCF file containing all small insertions)
Outport without VCF output
Cell 0: Path2PindelDFile (Pindel file containing all deletions)
Cell 1: Path2PindelSIFile (Pindel file containing all small insertions)

Views

STDOUT / STDERR
The node offers a direct view of its standard out and the standard error of the tool.

Installation

To use this node in KNIME, install KNIME4NGS from the following update site:

KNIME 4.3

You don't know what to do with this link? Read our NodePit Product and Node Installation Guide that explains you in detail how to install nodes to your KNIME Analytics Platform.

Wait a sec! You want to explore and install nodes even faster? We highly recommend our NodePit for KNIME extension for your KNIME Analytics Platform. Browse NodePit from within KNIME, install nodes with just one click and share your workflows with NodePit Space.

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.