berkeleylab / MetaBAT / issues / #90 - Present Abundance of Bins in output — Bitbucket

Issue #90 resolved

Rob Egan created an issue 2020-03-05

An unknown user commented on issues ~~#25~~ and ~~#74~~ because the suggested R script within them was not working.

It is a simple matter to aggregate the coverage of the sequences within a bin, so add to the metabat code base the R script and usage do so.

Comments (4)

Antti Rissanen
Hi,

I have used the R-script described here:

https://bitbucket.org/berkeleylab/metabat/issues/25/calculating-abundance-for-each-bin

to calculate the mean and median coverage for the bins in my lake sediment metagenomic dataset. I would want to use the calculated values as abundances of the bins to compare the abundances between samples. My question is about the normalization between samples using this approach (e.g. based on number of raw reads, number of assembled reads in each sample…?). Let´s say that according to my calculations using this R script, a particular bin has an average depth of 4 in sample A and an average depth of 6 in sample B. Can I then say that it is more abundant or that it´s relative abundance is higher in sample B?

Thank you and take care.

Antti J Rissanen, PhD

Tampere University, Tampere, Finland

‌
- 2020-04-08T07:53:56+00:00
Rob Egan reporter
Hi,

So that script calculates the median depths for a given bin, and I’d say it is generally okay to use. This measure, however, can be skewed if say there are 5 short scaffolds at depth 4 and one long scaffold (longer than the sum of the short scaffolds) at depth 7. In that case the median number reported would be 4, but the mean would be closer to 6. So it wouldn’t hurt to double check all the scaffolds within a bin if you need to make relative abundance conclusions such this.

While abundance is one metric we use to bin scaffolds together, it is not the only metric and if the assembly itself collapsed a repetitive region into a single scaffold (a common error pattern of metagenome assemblers), its depth would not accurately reflect the true abundance of the entire genome (or bin).

-Rob

‌
- 2020-04-09T22:03:36+00:00
Antti Rissanen
Thank you for the answer!

Antti
- 2020-04-15T07:13:41+00:00
Rob Egan reporter
- changed status to resolved
- 2024-05-22T18:49:26+00:00
Log in to comment

Assignee: Rob Egan

Type: enhancement

Priority: trivial

Status: resolved

Votes: 0

Watchers: 1

Jira: the preferred issue tracker for Bitbucket. Join the team!