Filter SNPs in consensus sequence

I’ve mapped some reads from an influenza-infected cell against an influenza reference gene which I know is closely related to the one present in my sample.

Part of the .aln output looks like this:

template:       ATTGTACATTTGGGGGGTTCACCACCCGGGTACGGACAAAGACCAAATCTTCCTGTATGC
                |||||||||||||||_|||||||||||||||_|||||||_|||||||||_||||||||||
query:          ATTGTACATTTGGGGTGTTCACCACCCGGGTGCGGACAAGGACCAAATc-tccTGTATGC

The deletion in the query does not make biologically sense, since it would result in a frameshift mutation, ruining the gene. Furthermore, the deletion is reported in an area where the query sequence is written in lower-case. Does lowercase not mean that there is no coverage at that area, or?.. If so, it seems unreasonable that KMA should report a deletion there.

Comments (4)