PicardTools

This is a wrapper node for PicardTools. PicardTools is a popular java tool suite for modifying files in SAM/BAM format (Format specification at http://samtools.github.io/hts-specs/SAMv1.pdf ). Currently, the PicardTools node implements four tools:

  • AddOrReplaceReadGroups overwrites read group information in the BAM/SAM file header and adds all reads to the new read group.
  • CollectInsertSizeMetric calculates the statistical distribution of the insert size for paired-end reads (distance between leftmost mapping position of a read and rightmost mapping position of its mate).
  • MarkDuplicates tries to identify reads that are PCR duplicates. It either removes them from the file or marks them using the FLAG attribute.
  • SortSam sorts a SAM or BAM file by mapping position or by read name.
Further information about PicardTools can be found at http://broadinstitute.github.io/picard/

Options

PicardTool selection
Choose one of the following four utilities: AddOrReaplceReadGroups, CollectInsertSizeMetrics, MarkDuplicates or SortSam.
Output
Decide whether the node should output a BAM or SAM file. If you choose BAM format you can additionally create an index file which is required for random access and for using all GATK nodes and Pindel.
Validation stringency
Some tools produce incomplete SAM/BAM file output. By setting validation stringency to SILENT you force PicardTools to ignore the incomplete fields as far as possible. Set validation stringency to STRICT to disable this behaviour.

AddOrReplaceReadGroup

Use file name to create RG tag
Tick this option if you have no read group information about your file but require a read group information (for example for the GATK nodes).
Read group ID
Enter the identifier of the read group.
Library name
Enter the name of the library belonging to the read group.
Sample name
Enter the name of the sample belonging to the read group.
Sequencing platform unit
Enter the name of the platform unit belonging to the read group.
Sequencing platform
Choose which platform was used for sequencing the reads of the read group.

CollectInsertSizeMetrics

Accumulation level
The default options ALL_READS infers the insert size distribution of all reads. Choosing SAMPLE, LIBRARY or READ_GROUP calculates the distribution for every sample, library or read group.
Sort order of input file
Check this option if your file is sorted and this information is not included in the BAM file header. This is to facilitate the calculations.
Discard reads of categories RF, TANDEM, FR
These categories refer to the orientation and mapping position of paired-end reads. If one category has a proportion of less than this threshold it is not considered for calculating insert sizes as they are most likely due to mapping or sequencing errors.
Deviation
Often anomalous read mapping distorts the insert size distribution. The deviation threshold is to leave out all read pairs with anomalous insert size for plotting and calculation of the average insert size.

MarkDuplicates

Removal of PCR duplicates
By ticking this option you remove all PCR duplicates from the output file instead of marking them in the FLAG attribute.
Sort order of input file
MarkDuplicates requires a file sorted by mapping position. If your file is sorted but this information is not included in the header you should check this option. Otherwise MarkDuplicates will reject the file and throw an error.

SortSam

Sort order
Choose if you want to sort the reads by mapping position or by read name. If you chose no sort order you can use SortSam for simple SAM to BAM or BAM to SAM format conversion or to create an index file.

Input Ports

Icon
Cell 0: Path2BAMFile or Path2SAMFile (BAM or SAM file to modify, if you use MarkDuplicates the file has to be sorted by mapping position)
The cell position does not matter. Additional columns are ignored.

Output Ports

Icon
Cell 0: Path2BAMFile or Path2SAMFile (BAM or SAM depending on the output format selected)
Cell 1 (optional): Path2ISMetrics (file with information about insert size distribution produced by CollectInsertSizeMetrics)

Views

STDOUT / STDERR
The node offers a direct view of its standard out and the standard error of the tool.

Workflows

Links

Developers

You want to see the source code for this node? Click the following button and we’ll use our super-powers to find it for you.