How the depth of contig was calculated?

Issue #35 resolved
Former user created an issue

Hi, Is there any where I can know how the depth of contig coverage was calculated

Comments (3)

  1. Rob Egan

    Hi,

    There are a couple of factors that are used (by default) to improve the depth coverage estimate. The depth per contig calculations are calculated by the jgi_summarize_bam_contigs_depth program in the MetaBAT distribution that can be invoked independently.

    Here are the options to that:

    Usage: jgi_summarize_bam_contig_depths <options> sortedBam1 [ sortedBam2 ...]
    where options include:
        --outputDepth       arg  The file to put the contig by bam depth matrix (default: STDOUT)
        --percentIdentity   arg  The minimum end-to-end % identity of qualifying reads (default: 97)
        --pairedContigs     arg  The file to output the sparse matrix of contigs which paired reads span (default: none)
        --unmappedFastq     arg  The prefix to output unmapped reads from each bam file suffixed by 'bamfile.bam.fastq.gz'
        --noIntraDepthVariance   Do not include variance from mean depth along the contig
        --showDepth              Output a .depth file per bam for each contig base
        --minMapQual        arg  The minimum mapping quality necessary to count the read as mapped (default: 0)
        --weightMapQual     arg  Weight per-base depth based on the MQ of the read (i.e uniqueness) (default: 0.0 (disabled))
        --includeEdgeBases       When calculating depth & variance, include the 1-readlength edges (off by default)
        --maxEdgeBases           When calculating depth & variance, and not --includeEdgeBases, the maximum length (default:75)
        --referenceFasta    arg  The reference file.  (It must be the same fasta that bams used)
    
    Options that require a --referenceFasta
        --outputGC          arg  The file to print the gc coverage histogram
        --gcWindow          arg  The sliding window size for GC calculations
        --outputReadStats   arg  The file to print the per read statistics
        --outputKmers       arg  The file to print the perfect kmer counts
    
    Options to control shredding contigs that are under represented by the reads
        --shredLength       arg  The maximum length of the shreds
        --shredDepth        arg  The depth to generate overlapping shreds
        --minContigLength   arg  The mimimum length of contig to include for mapping and shredding
        --minContigDepth    arg  The minimum depth along contig at which to break the contig
    

    But basically the depth is calculated for every read that maps with >=97% identity to the assembly scaffold : (i.e. matches / (matches + mismatches + insertions + deletions) >= 0.97)

    And the edges of a contig are excluded (1 average-read-length from either end)

    -Rob

  2. Yun

    Hi thanks to your prompt reply. I am some confused about what's the formula used to calculate the contig depth. In samtools, I can use command:

    samtools depth k119_4_sorted.bam >d1.txt
    

    to generate the depth of each site. and if I use metabat, I used the following command to generate the contig depth:

    jgi_summarize_bam_contig_depths --outputDepth d2.txt k119_4_sorted.bam
    

    Initially I thought the average depth of each site(from samtool) will be the contig depth.

    awk '{sum+=$3}END{print sum/329}' d1.txt
    

    However, it's not true. Can you help me to figure out where it goes wrong?

    Thanks, Yun

  3. Log in to comment