Is jgi_summarize_bam_contig_depths using the secondary alignment information
Dear developers,
I runned into the question whether BAM files containg both primary and secondary alignments and those containing only the primary ones could produce different results. Does the "jgi_summarize_bam_contig_depths" script uses the secondary alignment information for calculating the coverage of the contigs or it just filter those hits?
Thanks for your time, best,
Aitor
Comments (4)
-
-
Hi Rob,
Thanks a lot for the answer, it is really helpful.
However, I do not understand the last question about the length of the reads that are being filtered. Do you mean the length of the filtered secondary alignments in comparison with the primary ones? Because the reads are not being filtered but their secondary alignments, so the read length should be the same.Best,
Aitor -
Hi Aitor,
So that last comment is that while I was looking through the code to explain the behavior, I think there might be a bug whereby the avg read length is being calculated including the secondary / filtered reads (which are ignored for the purpose of the depth).
…. So iff the secondary read alignments have a significantly different read length distribution than the primary alignments (possibly because they are trimmed in the SAM file) and if they compose a large fraction of the total number of reads, then the avg read length that metabat uses could be incorrect which has an tertiary effect on the coverage depth because we ignore edge bases on the contigs up to half the read length…
I don’t expect any of the above to have a measurable effect but I do plan on fixing that possible bug once I verify and track it down.
-
- changed status to invalid
- Log in to comment
Hello Aitor,
The code consults the flag field for every record and should be skipping any mapping where any of these 3 flags are set:
BAM_FSECONDARY|BAM_FQCFAIL|BAM_FDUP
additionally it requires that the record’s mapping quality >= --minMapQual (default=0)
… along with records in the bam that have the BAM_FUNMAP flag set too, obviously.
So, if the only difference between the two BAM files is the inclusion of secondary mappings in one of them, the depths should be identical.
However, I do see that the reads that would be filtered can affect the the average read length calculation and hence the length of the edges on a contig which would be discounted for the final coverage depth. Can you easily tell if the secondary reads included in one of your BAM files are significantly shorter than the original read length, and if there are a large percentage of secondary reads in the file?
Best,
Rob