There were no large target contigs. How can i fix it ?

Issue #168 resolved
yicheng Yang created an issue

I run metabat2 with commond :

docker run -i --privileged --rm --workdir $(pwd) -v/calculate:/calculate metabat/metabat:latest sh -c "metabat2 -m 1500 -t 32 -i SLJ22.contigs.fa -a depth.txt -o bins_dir/bin -d "

and the output

MetaBAT 2 (v2.17-22-gfc29e6e) using minContig 1500, minCV 1.0, minCVSum 1.0, maxP 95%, minS 60, maxEdges 200 and minClsSize 200000. with random seed=1705332463
[00:00:00] Executing with 32 threads
[00:00:00] Parsing abundance file header [253.0Gb / 502.4Gb]
[00:00:00] Parsing assembly file [253.0Gb / 502.4Gb]
[00:00:00] Number of large contigs >= 1500 bp are 9559, and small contigs >= 1000 bp are 7878
[00:00:00] Allocating 9559 contigs by 1 samples abundances [253.9Gb / 502.4Gb]
[00:00:00] Allocating 9559 contigs by 1 samples variances [253.9Gb / 502.4Gb]
[00:00:00] Allocating 7878 small contigs by 1 samples abundances [253.9Gb / 502.4Gb]
[00:00:00] Reading 0.0Gb abundance file [253.9Gb / 502.4Gb]
[00:00:00] Finished reading 0 contigs and 1 coverages from depth.txt [253.9Gb / 502.4Gb]
[00:00:00] nobs = 0
[00:00:00] r = 0 (num = 0), (nskip = 0)
[00:00:00] seqs.size = 9559, contig_names.size = 9559
[00:00:00] Number of target contigs: 0 of large (>= 1500) and 0 of small ones (>=1000 & <1500).
[Error!] There were no large target contigs. Cannot proceed. Rerun with the '-d' option for more details.

SLJ22.contigs.fa was assembled by MEGAHIT .

file format type num_seqs sum_len min_len avg_len max_len
SLJ22.contigs.fa FASTA DNA 48,327 82,214,654 500 1,701.2 363,116

how can i fix this Error ?

Thanks.

Comments (4)

  1. Rob Egan

    Hello Yicheng,

    Thank you for this report, and I can confirm this is absolutely a bug in this latest, unofficial release of MetaBAT. The bug was introduced in early December and affects assemblies that include contigs below the minimum length that MetaBAT considers (<1000bp). I did not observe this bug in my testing data because all the contigs used there are above 1000bp.

    I just pushed a fix that I hope should work for you after about an hour as it works through the deployment to docker.

    Best,

    Rob

  2. yicheng Yang reporter

    Thank you for your reply and your fix.

    I'll pull it again and try to use it.

    But I found something interesting,

    I have assembled my cleandata again by MetaSpades.

    Then, I run metabat2 in docker before you fixed it with this metaspades contigs result.

    Unexpectedly, metabats run smoothly and produced the results.

    Is this due to the difference in the assembly of the two software ?

    Here are the results of their assembly :

    MEGAHIT :

    file format type num_seqs sum_len min_len avg_len max_len Q1 Q2 Q3 sum_gap N50 Q20(%) Q30(%)
    SLJ22.contigs.fa FASTA DNA 48,327 82,214,654 500 1,701.2 363,116 613 801 1,277 0 2,684 0

    #contigs more than 200000bp (MEGAHIT)
    file  format  type  num_seqs    sum_len  min_len    avg_len  max_len
    -     FASTA   DNA         10  2,401,663  205,752  240,166.3  363,116
    

    METASPADES :

    file format type num_seqs sum_len min_len avg_len max_len Q1 Q2 Q3 sum_gap N50 Q20(%) Q30(%)
    scaffolds.fasta FASTA DNA 190,377 122,750,666 100 644.8 779,602 269 336 507 0 862 0 0

    #contigs more than 200000bp (METASPADES)
    file  format  type  num_seqs    sum_len  min_len    avg_len  max_len
    -     FASTA   DNA         15  4,606,941  206,236  307,129.4  779,602
    

    As you can see their overall assembly differences are not great, but metaspades' long pieces are more.

    Thanks again for your prompt reply and clarification.

    Best,

    Yicheng

  3. Rob Egan

    So, I wouldn’t trust the results from the buggy version… I’d try with either the last release 2.17 or the latest version v2.17-23-g10d6f87. The bug triggers if the assembly and depth file has any contigs shorter than 1000 bases in the assembly, and would only have no effect if they were all ordered at the very end of the assembly file. So if the assembler sorted the contigs by length, then it should be okay, but if not then the bug could adversely affect the results.

    -Rob

  4. Log in to comment