Present Abundance of Bins in output
Comments (4)
-
-
reporter Hi,
So that script calculates the median depths for a given bin, and I’d say it is generally okay to use. This measure, however, can be skewed if say there are 5 short scaffolds at depth 4 and one long scaffold (longer than the sum of the short scaffolds) at depth 7. In that case the median number reported would be 4, but the mean would be closer to 6. So it wouldn’t hurt to double check all the scaffolds within a bin if you need to make relative abundance conclusions such this.
While abundance is one metric we use to bin scaffolds together, it is not the only metric and if the assembly itself collapsed a repetitive region into a single scaffold (a common error pattern of metagenome assemblers), its depth would not accurately reflect the true abundance of the entire genome (or bin).
-Rob
-
Thank you for the answer!
Antti
-
reporter - changed status to resolved
- Log in to comment
Hi,
I have used the R-script described here:
https://bitbucket.org/berkeleylab/metabat/issues/25/calculating-abundance-for-each-bin
to calculate the mean and median coverage for the bins in my lake sediment metagenomic dataset. I would want to use the calculated values as abundances of the bins to compare the abundances between samples. My question is about the normalization between samples using this approach (e.g. based on number of raw reads, number of assembled reads in each sample…?). Let´s say that according to my calculations using this R script, a particular bin has an average depth of 4 in sample A and an average depth of 6 in sample B. Can I then say that it is more abundant or that it´s relative abundance is higher in sample B?
Thank you and take care.
Antti J Rissanen, PhD
Tampere University, Tampere, Finland