calcObservedMutations isn't handling germline gaps correctly
Issue #138
resolved
These should all work, but calcObservedMutations fails when there are gap characters in the germline.
> calcObservedMutations("ATGCTT", "ATGCCC")
seq_r seq_s
1 1
> calcObservedMutations("ATG-TT", "ATGCCC")
seq_r seq_s
1 1
> calcObservedMutations("ATGGTT", "ATG-CC")
Error in calcObservedMutations("ATGGTT", "ATG-CC") :
ncol(mutations_array_raw) == length(mutations_pos) is not TRUE
> calcObservedMutations("ATG-TT", "ATG-CC")
Error in calcObservedMutations("ATG-TT", "ATG-CC") :
ncol(mutations_array_raw) == length(mutations_pos) is not TRUE
Comments (4)
-
reporter -
reporter -
assigned issue to
-
assigned issue to
-
Around Line 1840 in
mutationProfiling.R
, a character vector in which each entry is a single string, such as:c_germlineSeq_codons [1] "-" "C" "C" "-" "C" "C"
needs to be converted into a character vector in which each entry is a triplet string, such as:
> c_germlineSeq_codons [1] "-CC" "-CC"
Previously, the code was
c_germlineSeq_codons <- strsplit(gsub("([[:alnum:]]{3})", "\\1 ", c2s(c_germlineSeq_codons)), " ")[[1]]
However,
[[:alnum:]]{3}
failed to capture non-ATGC characters such as.
and-
, such as the gap in-CC
.New regex in [commit 2c7e782](https://bitbucket.org/kleinstein/shazam/commits/2c7e782ddf32d5e9bf5f49120740aae1d20cbedd) fixed this bug.
Good catch!
-
- changed status to resolved
Resolved in commit 2c7e782
- Log in to comment
I suspect this is the same bug, so I’m going to put this into this ticket:
These clonal consensus sequences, output by
collapseClones
withbreakTiesStochastic=TRUE
, produce the same error:This weird consensus bit starting at position 90 looks awfully strange, but it alone doesn’t produce the error:
Test code: