MetaBAT is not estimating contig size correctly

Issue #58 resolved
Former user created an issue

Hello, I'm running MetaBAT 2 to bin 3 samples from a soil metagenome and it doesn't create any bins. when I checked the log I noticed that it estimates the number of contigs >= 2500bp to be zero (When I know I have large contigs)! the command I ran was:

runMetaBat.sh -d -v -t 12 BulkC.Allcontigs.sl.rn.fasta Bulk*C.mergedpaired.sorted.bam

it output 0 bins. I'll attach the log.

What can be causing the problem? could it be the format of my fasta?

the names follow this structure:

Filtrate1C.1 NODE_1_length_451967_cov_10.4532 <SEQUENCE>

Thanks in advance

Comments (8)

  1. Alinne Lima

    Hello Rob,

    the fasta files follow the standard fasta structure, I believe a typo casued it to appear in one line

    Filtrate1C.1 NODE_1_length_451967_cov_10.4532

    SEQUENCE

    I attached the first two sequences of the fasta file. The sequences are in one single line, so I don't know if that would affect anything.

    Thanks

  2. Rob Egan

    There is a space between the '>' and the rest of the identifier. That would leave no name and just a comment for each sequence. I believe that every sequence name needs to be unique and >0 length. If you take the space out, then it should work fine.

  3. Rob Egan

    Hi Alinne, Did fixing the assembly fasta resolve your issue with MetaBAT? And what assembler did you use to generate this file in the first place? Thanks, Rob

  4. Alinne Lima

    Hi Rob, It worked just fine. I’m glad the problem was due to such a minor issue. I use SPAdes assembler but the additional contig Identifier is added by me for some in house applications. I’ll be mindful of issues regarding spacing from now on. Again Thank you very much!

  5. Log in to comment