Depth input for multi sample binning

Issue #146 resolved
Marcus Wenne created an issue

Dear developer,

I am trying to bin samples from a large soil metagenomic dataset but am unsure what strategy to use. If I am correct the preferred way of running Metabat2 is to co-assemble all samples and then map all reads from each individual sample to generate the depth file. My dataset is however too large to co-assemble. I instead choose to assemble each treatment individually. My question then is if I should generate the depth file by mapping all reads from all treatments to single assemblies. Or if I only should map the reads that was used to generate each single assembly.

Example: Let’s say that I have treatments X, Y and Z which are assembled individually. Should I map all reads from X, Y and Z to assembly Z to generate the depth file for the assembly. Or should I only map the reads from sample Z to assembly Z? 

Regards,

Marcus

Comments (2)

  1. Rob Egan

    If your data set is too large to co-assemble, then you can treat each individual assembly as a different co-assembly. Either choose one and map every one of the samples to that or repeat for each of the single assemblies. You will be dealing with a lot of duplication and extra work, so it is my recommendation is to try your best to get a better single co-assembly, maybe use MHM2 and/or take a fraction of the fastq data from each sample to co-assemble.

    Take 1/3X + 1/3Y + 1/3Z and co-assemble. Then map the entire X, Y, Z to that.

  2. Log in to comment