Present Abundance of Bins in output

Issue #90 resolved
Rob Egan created an issue

An unknown user commented on issues #25 and #74 because the suggested R script within them was not working.

It is a simple matter to aggregate the coverage of the sequences within a bin, so add to the metabat code base the R script and usage do so.

Comments (4)

  1. Antti Rissanen

    Hi,

    I have used the R-script described here:

    https://bitbucket.org/berkeleylab/metabat/issues/25/calculating-abundance-for-each-bin

    to calculate the mean and median coverage for the bins in my lake sediment metagenomic dataset. I would want to use the calculated values as abundances of the bins to compare the abundances between samples. My question is about the normalization between samples using this approach (e.g. based on number of raw reads, number of assembled reads in each sample…?). Let´s say that according to my calculations using this R script, a particular bin has an average depth of 4 in sample A and an average depth of 6 in sample B. Can I then say that it is more abundant or that it´s relative abundance is higher in sample B?

    Thank you and take care.

    Antti J Rissanen, PhD

    Tampere University, Tampere, Finland

  2. Rob Egan reporter

    Hi,

    So that script calculates the median depths for a given bin, and I’d say it is generally okay to use. This measure, however, can be skewed if say there are 5 short scaffolds at depth 4 and one long scaffold (longer than the sum of the short scaffolds) at depth 7. In that case the median number reported would be 4, but the mean would be closer to 6. So it wouldn’t hurt to double check all the scaffolds within a bin if you need to make relative abundance conclusions such this.

    While abundance is one metric we use to bin scaffolds together, it is not the only metric and if the assembly itself collapsed a repetitive region into a single scaffold (a common error pattern of metagenome assemblers), its depth would not accurately reflect the true abundance of the entire genome (or bin).

    -Rob

  3. Log in to comment