Is jgi_summarize_bam_contig_depths using the secondary alignment information

Issue #130 invalid
Former user created an issue

Dear developers,

I runned into the question whether BAM files containg both primary and secondary alignments and those containing only the primary ones could produce different results. Does the "jgi_summarize_bam_contig_depths" script uses the secondary alignment information for calculating the coverage of the contigs or it just filter those hits?

Thanks for your time, best,

Aitor

Comments (4)

  1. Rob Egan

    Hello Aitor,

    The code consults the flag field for every record and should be skipping any mapping where any of these 3 flags are set:

    BAM_FSECONDARY|BAM_FQCFAIL|BAM_FDUP

    additionally it requires that the record’s mapping quality >= --minMapQual (default=0)

    … along with records in the bam that have the BAM_FUNMAP flag set too, obviously.

    So, if the only difference between the two BAM files is the inclusion of secondary mappings in one of them, the depths should be identical.

    However, I do see that the reads that would be filtered can affect the the average read length calculation and hence the length of the edges on a contig which would be discounted for the final coverage depth. Can you easily tell if the secondary reads included in one of your BAM files are significantly shorter than the original read length, and if there are a large percentage of secondary reads in the file?

    Best,

    Rob

  2. Aitor Blanco Miguez

    Hi Rob,
    Thanks a lot for the answer, it is really helpful.
    However, I do not understand the last question about the length of the reads that are being filtered. Do you mean the length of the filtered secondary alignments in comparison with the primary ones? Because the reads are not being filtered but their secondary alignments, so the read length should be the same.

    Best,
    Aitor

  3. Rob Egan

    Hi Aitor,

    So that last comment is that while I was looking through the code to explain the behavior, I think there might be a bug whereby the avg read length is being calculated including the secondary / filtered reads (which are ignored for the purpose of the depth).

    …. So iff the secondary read alignments have a significantly different read length distribution than the primary alignments (possibly because they are trimmed in the SAM file) and if they compose a large fraction of the total number of reads, then the avg read length that metabat uses could be incorrect which has an tertiary effect on the coverage depth because we ignore edge bases on the contigs up to half the read length…

    I don’t expect any of the above to have a measurable effect but I do plan on fixing that possible bug once I verify and track it down.

  4. Log in to comment