Wiki

Clone wiki

ATLAS / Sequence Data Processing Tools: mergeReads

Overview

This task concerns only BAM files with paired-end sequencing data. It should be run before any consecutive variant discovery or population genetic tool to remove the redundant sequencing data at the overlap of paired reads. At overlapping positions, the base of one of the two reads is set to to have a quality 0. Reads whose mates are further than a certain distance away (or nonexistant or on a different chromosome) are considered to be "orphans" and are not merged.

Input

  • A BAM file
  • blacklist.txt (optional): A txt file with blacklisted read names that should not be merged, each on a new line. Consider this option if you get the error message "One read of '<read_name>' is reverse mate, but forward one has not been read!"

Output

  • A BAM file with suffix _mergedReads.bam
  • A file listing all reads that were filtered out in the merging process with suffix _ignoredReads.txt.gz

Usage Example

./atlas task=mergeReads bam=example.bam

Specific Arguments

  • updateQuality: Adapt quality scores to reflect concordance or discordance of overlapping bases.
  • keepRandomBase: When two bases overlap, keep a random one indstead of the one with the highest quality score.
  • keepRandomRead: When two reads overlap, keep a random read for the whole overlapping stretch.
  • keepOrphans: By default orphans are filtered out. Keep them in the BAM file but set their SAM flag to "improper pair". By default, all ATLAS tasks will ignore improper pairs.
  • acceptedDistance: Accepted distance between two mates. If mates are further apart they are considered to be orphans and are not merged. This distance should be set higher than the maximum fragment length. Default = 2000

Engine Parameters

Engine parameters that are common to all tasks can be found here. This task does not accept the parameters specific to windows.

Updated