Depth input for multi sample binning
Dear developer,
I am trying to bin samples from a large soil metagenomic dataset but am unsure what strategy to use. If I am correct the preferred way of running Metabat2 is to co-assemble all samples and then map all reads from each individual sample to generate the depth file. My dataset is however too large to co-assemble. I instead choose to assemble each treatment individually. My question then is if I should generate the depth file by mapping all reads from all treatments to single assemblies. Or if I only should map the reads that was used to generate each single assembly.
Example: Let’s say that I have treatments X, Y and Z which are assembled individually. Should I map all reads from X, Y and Z to assembly Z to generate the depth file for the assembly. Or should I only map the reads from sample Z to assembly Z?
Regards,
Marcus
Comments (2)
-
-
- changed status to resolved
- Log in to comment
If your data set is too large to co-assemble, then you can treat each individual assembly as a different co-assembly. Either choose one and map every one of the samples to that or repeat for each of the single assemblies. You will be dealing with a lot of duplication and extra work, so it is my recommendation is to try your best to get a better single co-assembly, maybe use MHM2 and/or take a fraction of the fastq data from each sample to co-assemble.
Take 1/3X + 1/3Y + 1/3Z and co-assemble. Then map the entire X, Y, Z to that.