Support for multi-sample contig binning

Issue #127 wontfix
Former user created an issue

I'm working towards building a database of de novo assembled genomes present in a collection of 128 MGS samples I have. I ran per-sample metaSPAdes assemblies on each of the 128 samples, then merged all resulting contigs into a single pool. I then mapped each sample's reads against that pool and fed it to MetaBat2, which generated ~8k genome bins. That number seems reasonable to me, but now that I'm looking at the output I am wondering if I need to somehow 'flatten' the contigs into a single representation of each binned genome. By 'flatten' I mean either building a consensus of the contigs, or picking representative contigs in each bin that represent a 'golden path' through the assembly.

Is that something I would need to do? Or does MetaBat2 handle that issue? This is my first time using a contig binning tool and I want to be sure I'm using it correctly.

Comments (2)

  1. Rob Egan

    So a primary assumption that MetaBAT makes is that the assembly which is provided consists of unique contigs and that the duplication ratio of the assembly is low. Simply concatenating contigs from multiple assemblies together does not achieve the desired starting point and will amount to the garbage-in yields garbage out paradigm.

    Our current recommendation is to perform a single co-assembly of all the samples together and use that single result to then perform binning. If you have sufficient computing hardware, MHM2 or metaspades will be your best bets to perform a co-assembly of all the data.

    An alternative possibility which provides lesser quality results is to apply dedup to your many single assemblies and try metabat with that. Deduplication is far from perfect and generally yields chimeric sequences across strains & species. I do not know if it will work well across 128 different single sample assemblies. This is our recommended tool to pursue that strategy: https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/dedupe-guide/

  2. Log in to comment