Wiki

Clone wiki

ATLAS / splitMerge

Overview

This task should be run before any consecutive variant discovery or population genetic tool to split single end read groups by length and to merge paired end reads. This task is a combination of tasks splitRGByLength and mergeReads. You need to specify which read groups should be considered for splitting or merging. Others will just be written to the BAM as they are.

Input

  • A BAM file

  • blacklist.txt (optional): A txt file with blacklisted read names that should be ignored and just written to file, each on a new line.

  • A .txt file :

    Create a .txt file containing the read group settings: the names of the RG to be considered, if they are single-end read, paired or mixed, and if they are single-end or paired their maximum cycle number (separated by any whitespace). The read group names can be found in the header of your BAM file but note that 'ID:' is not part of the read group name. If you do not know at what maximum cycle number the genome was sequenced at you can find the maximum read length in the BAM file with our task BAMDiagnostics. If you do not know whether a read group was sequenced with a paired or single-end protocol, or whether it is mixed (i.e. contains reads from both paired and single-end sequencing) you can check the SAM flags of the reads. If you specify the wrong type, ATLAS will throw an error saying that it found nonsensical settings for an offending read.

    Example:

    readgroup1 single 100

    readgroup2 single 150

    readgroup5 paired

    readgroup6 mixed 150

Output

  • A BAM file with suffix _mergedReads.bam
  • A file listing all reads that were filtered out in the merging process with suffix _ignoredReads.txt.gz

Usage Example

./atlas task=splitMerge bam=example.bam

Specific Arguments

  • readGroupSettings: Provide file that contains read group settings
  • updateQuality: When two bases overlap in merging, adapt quality scores to reflect concordance or discordance of overlapping bases.
  • keepRandomBase: When two bases overlap in merging, keep a random one indstead of the one with the highest quality score.
  • keepRandomRead: When two reads overlap in merging, keep a random read for the whole overlapping stretch.
  • keepOrphans: By default orphans are filtered out. Keep them in the BAM file but set their SAM flag to "improper pair". By default, all ATLAS tasks will ignore improper pairs.
  • acceptedDistance: Accepted distance between two mates. If mates are further apart they are considered to be orphans and are not merged. This distance should be set higher than the maximum fragment length. Default = 2000

Engine Parameters

Engine parameters that are common to all tasks can be found here. This task does not accept the parameters specific to windows.

Updated