runing with 11million contigs

Issue #55 resolved
Fred Meng created an issue

hello! I'm using metabat2 version 2.12.1 running with my data, about 1.2 terabase sequencing reads assembled into 11 million contigs(14GB in base length) by megahits, seems metabat2 running endlessly. any advice about the running and the time consumption? the commond "metabat2 -i assembl.fasta -a /output/work_files/metabat_depth.txt -o /output/metabat2_bins/bin -m 1500 -t 16 --unbinned"

Comments (4)

  1. Feng Li

    hi, Fred.

    we don't have quite effective way to help for now. yet, I want to mention that "-m 1500" means minimum size of a contig for binning(default 2500). it'll make the computation much larger. if possible, i advise you remove this option.

    besides, -v option will give you verbose output, which may help monitor the progress of binning. i think it's better than nothing.

    good luck~

  2. Rob Egan

    I concur with this. 11 million contigs will result in upwards of 250 trillion calculations before the clustering will start which will take some time on even the most powerful computers. We are looking into more efficient approaches than N squared and performing the calculations on a cluster / MPI job but if time is a constraint for you, increasing the minimum contig size to 2500 would reduce the total number of contigs that need to be compared considerably, and should have the effect of improving the accuracy of the resulting clusters (at the cost of completeness, obviously).

  3. Log in to comment