Wiki

Clone wiki

ATLAS / Methods: Genotyping Model

Genotyping model accounting for post-mortem damage

Following commonly used approaches (Li, 2011, e.g.), we assume here that a sequencing read is equally likely to cover any of the two alleles of an individual and that sequencing errors may result in any of the alternative bases with equal probability \(\frac{\epsilon_{ij}}{3}\). In the absence of post-mortem damage (PMD), the probability of observing a base \(d_{ij}\) given the underlying genotype \(g_i=kl\) is then given by

\begin{equation*} \mathbb{P}(d_{ij}|g_i=kl, \epsilon_{ij}) = \begin{cases} 1-\epsilon_{ij} &\mbox{if } k=l=d_{ij}\\ \frac{\epsilon_{ij}}{3} &\mbox{if } k \neq d_{ij}, l \neq d_{ij}\\ \frac{1}{2}-\frac{\epsilon_{ij}}{3} &\mbox{if } k \neq l, k=d_{ij} \mbox{ or } l=d_{ij} \end{cases}, \end{equation*}

In ancient DNA, differences between the base observed within a read and the underlying alleles may also be the result of PMD. Following Hofmanova et al. (2016), let us denote by \(D_{C\rightarrow T}(q_{ij})\) and \(D_{G\rightarrow A}(q_{ij})\) the known probability that a \(C\rightarrow T\) or \(G\rightarrow A\) PMD occurred at the position covering site \(i\) in read \(j\), respectively. In the presence of PMD, the probability of observing a base \(d_{ij}\) given the underlying genotype \(g_i=kl\) is given by

\begin{equation*} \mathbb{P}(d_{ij}|g_i=kl, \epsilon_{ij}, q_{ij}) = \begin{cases} (1-D_{G\rightarrow A}(q_{ij}))\dfrac{\epsilon_j}{3} + D_{G\rightarrow A}(q_{ij})(1-\epsilon_j) &\mbox{if } d_{ij}=A , gi=GG \\ \dfrac{(1 + D_{G\rightarrow A}(q_{ij}))(1-\epsilon_j)}{2} + \dfrac{(1-D_{G\rightarrow A}(q_{ij}))\epsilon_j}{6} &\mbox{if } d_{ij}=A , gi=AG \\ \dfrac{D_{G\rightarrow A}(q_{ij})(1-\epsilon_j)}{2} + \dfrac{(2D_{G\rightarrow A}(q_{ij}))\epsilon_j}{6} &\mbox{if } d_{ij}=A , gi=CG, GT \\ (1-D_{C\rightarrow T}(q_{ij}))(1-\epsilon_j) + D_{C\rightarrow T}(q_{ij})\dfrac{\epsilon_j}{3} &\mbox{if } d_{ij}=C , gi=CC \\ \dfrac{(1-D_{C\rightarrow T}(q_{ij}))(1-\epsilon_j)}{2} + \dfrac{(1+D_{C\rightarrow T}(q_{ij}))\epsilon_j}{6} &\mbox{if } d_{ij}=C , gi=AC, CG, CT \\ (1-D_{G\rightarrow A}(q_{ij}))(1-\epsilon_j) + D_{G\rightarrow A}(q_{ij})\dfrac{\epsilon_j}{3} &\mbox{if } d_{ij}=G , gi=GG \\ \dfrac{(1-D_{G\rightarrow A}(q_{ij}))(1-\epsilon_j)}{2} + \dfrac{(1+D_{G\rightarrow A}(q_{ij}))\epsilon_j}{6} &\mbox{if } d_{ij}=G , gi=AG, CG, GT \\ (1-D_{C\rightarrow T}(q_{ij}))\dfrac{\epsilon_j}{3} + D_{C\rightarrow T}(q_{ij})(1-\epsilon_j) &\mbox{if } d_{ij}=T , gi=TT \\ \dfrac{D_{C\rightarrow T}(q_{ij})(1-\epsilon_j)}{2} + \dfrac{(2-D_{C\rightarrow T}(q_{ij}))\epsilon_j}{6} &\mbox{if } d_{ij}=T , gi=AC, CG \\ \dfrac{(1 + D_{C\rightarrow T}(q_{ij}))(1-\epsilon_j)}{2} + \dfrac{(1-D_{C\rightarrow T}(q_{ij}))\epsilon_j}{6} &\mbox{if } d_{ij}=T , gi=CT \\ 1-\epsilon_{i,j} &\mbox{if } d_{ij}=A, g_i=AA \mbox{ or } d_{ij}=T, g_i=TT \\ \dfrac{1}{2}-\epsilon_{i,j} &\mbox{if } d_{ij}=A, g_i=AC, AT \mbox{ or } d_{ij}=T, g_i=AT, GT \\ \dfrac{\epsilon}{3} &\mbox{otherwise } \end{cases}. \end{equation*}

References

  • Hofmanová et al. (2016). Early farmers from across Europe directly descended from Neolithic Aegeans. Proceedings of the National Academy of Sciences of the United States of America, 113(25), 6886–91.
  • Li, H. (2011). A statistical framework for {SNP} calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics, 27(21), 2987–2993.

Updated