non-NA sigma mean returned by BASELINe local test for codons with no sequence data during codon-by-codon analysis

Issue #80 resolved
Julian Zhou created an issue

Codons for which there is no input sequence data should expect a sigma mean of NA, since there is no data, meaning observed number of replacement mutations and observed total number of mutations should both be 0, leading to NA when calculating 0/0 for the observed frequency of replacement mutation.

Comments (1)

  1. Julian Zhou reporter

    Bug in calcBaselineHelper():

    obsN_Index <- grep( paste0("OBSERVED_", region),  names(observed) )
    

    This following example illustrates what happens.

    Say region=codon_1. Expect grep to find only codon_1_S and codon_1_R. Instead, codon_10_S, codon_10_R, codon_101_S, codon_101_R, etc. also match and are found by grep.

    As a result, the total number of observed mutation at codon 1, for which there is no input sequence data and which should therefore have been 0, also includes mutations at codons 10 and 101, for which there is input sequence data, making it non-0. Non-0 total observed mutation count leads to non-NA value returned by calcBaselineBinomialPdf().

    Fixed in commit 6755167

  2. Log in to comment