Wiki

Clone wiki

ATLAS / Auxiliary: thetaQC

Overview

When sequencing depth is high, there is strong evidence for the true genotype. However, when depth is low, the likelihoods will be more similar among genotypes and the error rates of the bases will have a big influence on which genotype has the higher likelihood. It is thus more important that we correctly recalibrate the base quality scores at low depth than at high depth. The task thetaQC uses this concept to assess how well our recalibration performs, i.e. how well our estimated error rates correspond to the true errors. In thetaQC we use the recal parameters estimated from a full genome to estimate heterozygosity (see task theta) on downsampled versions of the genome. If the estimates of heterozygosity stay constant across downsampled versions, we consider the recalibration parameters to be correct. If they decrease, the error rates were overestimated during recalibration, e.g. because there were true variants at the sites considered to be monomorphic. If they increase, the error rates were underestimated during recalibration, e.g. because there was too little data to observe sequencing errors.

Input

  • A BAM file
  • recalibration parameters produced with recal
  • PMD input file produced with PMD (optional)

Output

  • heterozygosity estimates for different downsampling probabilities

Usage example

./atlas task=thetaQC bam=example.bam prob=1,0.5,0.25,0.1 recal=example_recalibrationEM.txt pmdFile=example_PMD_input_Empiric.txt

Specific arguments

prob: A comma-separated list of the downsampling probabilities, with which a read is kept. A new BAM file is created for every probability

Engine Parameters

Engine parameters that are common to all tasks can be found here.

Updated