How the depth of contig was calculated?

Rob Egan

Hi,

There are a couple of factors that are used (by default) to improve the depth coverage estimate. The depth per contig calculations are calculated by the jgi_summarize_bam_contigs_depth program in the MetaBAT distribution that can be invoked independently.

Here are the options to that:

Usage: jgi_summarize_bam_contig_depths <options> sortedBam1 [ sortedBam2 ...]
where options include:
    --outputDepth       arg  The file to put the contig by bam depth matrix (default: STDOUT)
    --percentIdentity   arg  The minimum end-to-end % identity of qualifying reads (default: 97)
    --pairedContigs     arg  The file to output the sparse matrix of contigs which paired reads span (default: none)
    --unmappedFastq     arg  The prefix to output unmapped reads from each bam file suffixed by 'bamfile.bam.fastq.gz'
    --noIntraDepthVariance   Do not include variance from mean depth along the contig
    --showDepth              Output a .depth file per bam for each contig base
    --minMapQual        arg  The minimum mapping quality necessary to count the read as mapped (default: 0)
    --weightMapQual     arg  Weight per-base depth based on the MQ of the read (i.e uniqueness) (default: 0.0 (disabled))
    --includeEdgeBases       When calculating depth & variance, include the 1-readlength edges (off by default)
    --maxEdgeBases           When calculating depth & variance, and not --includeEdgeBases, the maximum length (default:75)
    --referenceFasta    arg  The reference file.  (It must be the same fasta that bams used)

Options that require a --referenceFasta
    --outputGC          arg  The file to print the gc coverage histogram
    --gcWindow          arg  The sliding window size for GC calculations
    --outputReadStats   arg  The file to print the per read statistics
    --outputKmers       arg  The file to print the perfect kmer counts

Options to control shredding contigs that are under represented by the reads
    --shredLength       arg  The maximum length of the shreds
    --shredDepth        arg  The depth to generate overlapping shreds
    --minContigLength   arg  The mimimum length of contig to include for mapping and shredding
    --minContigDepth    arg  The minimum depth along contig at which to break the contig

But basically the depth is calculated for every read that maps with >=97% identity to the assembly scaffold : (i.e. matches / (matches + mismatches + insertions + deletions) >= 0.97)

And the edges of a contig are excluded (1 average-read-length from either end)

-Rob

2017-09-20T22:01:53+00:00

Rob Egan

changed status to resolved

2017-09-20T22:01:58+00:00

Yun

Hi thanks to your prompt reply. I am some confused about what's the formula used to calculate the contig depth. In samtools, I can use command:

samtools depth k119_4_sorted.bam >d1.txt

to generate the depth of each site. and if I use metabat, I used the following command to generate the contig depth:

jgi_summarize_bam_contig_depths --outputDepth d2.txt k119_4_sorted.bam

Initially I thought the average depth of each site(from samtool) will be the contig depth.

awk '{sum+=$3}END{print sum/329}' d1.txt

However, it's not true. Can you help me to figure out where it goes wrong?

Thanks, Yun

2017-09-21T15:00:44+00:00

Comments (3)