The probability of a nucleotide in a DNA fragment being affected by post-mortem damage (PMD) is dependent on its distance from each end of the fragment. If a read sequenced with single-end is longer than the DNA fragment, these distances are known. If the read is shorter than the fragment, on the other hand, the length of the fragment is unknown, and therefore also the nucleotide's distance from the 3'-end of the fragment. For these reads the PMD pattern cannot be estimated as an exact function of distance from the fragment end. It is therefore necessary to split the readgroups that were sequenced with a single-end protocol into two new readgroups: one containing the reads that are shorter than the maximum read length given by the sequencing machine, and the other containing the reads that have the maximum read length. This way, the PMD patterns for the read group with the long reads will be slightly inaccurate, while they can still be estimated accurately for the read group containing the shorter reads. If no trimming was performed the maximum read length should correspond to the maximum number of sequencing cycles at which the genome was sequenced.
A BAM file
A .txt file :
Create a .txt file containing the names of the single-end read groups and their maximum cycle number (separated by any whitespace). The read group names can be found in the header of your BAM file but note that 'ID:' is not part of the read group name. If you do not know at what maximum cycle number the genome was sequenced at you can find the maximum read length in the BAM file with our task BAMDiagnostics.
A BAM file with suffix "_splitRG.bam" :
It containing the originial read group and a new read group. The original readgroup contains the reads that are of the specified maximum read length, while the read group with suffix _truncated contains the reads that are of a smaller read length than the specified maximum.
./atlas task=splitRGbyLength bam=example.bam readGroups=singleEndReadgroups.txt verbose
- readGroups : Specify the file with list of single-end read groups that should be split according to their maximum read length.
- allowForLarger : Allow reads that are larger than maximum length to be added to the "_truncated" read group. This is useful if you think there might be an adapter contamination.
Engine parameters that are common to all tasks can be found here.