A question on coverage variance in the coverage.tsv file from jgi_summarize_bam_contig_depths

Issue #78 resolved
Zhiping Zhong created an issue

Hi I got a coverage.tsv file after using jgi_summarize_bam_contig_depths. There are two columns in this file: .bam and .bam-var. The values in the .bam-var column is much higher than the values in the .bam column (see attached file). My question is: what does the values in the second column (i.e., .bam-var) mean?

my understanding: the values in the column .bam are the average coverage of the contig; while the values in the other column .bam-var are the variance of reads coverages to the average coverage value in the first column. The coverage of some matched reads are very high, and some are very low, so the coverage variance is very high. is this right? thanks for your patience to my naive question.

best,

Ping

Comments (2)

  1. Rob Egan

    HI Ping, Yes variance is a statistical measure of how narrow or wide the distribution of coverage is around the mean for a given contig. A high variance indicates uneven coverage, whereas a variance near the mean indicates the expected and even coverage of a Poisson distribution (for which this should be, at least in theory if there are no systematic biases). MetaBAT uses this variance along with the mean to help calculate how similar two contigs' coverage distributions are to each other.

  2. Log in to comment