MetaBAT is not estimating contig size correctly
Hello, I'm running MetaBAT 2 to bin 3 samples from a soil metagenome and it doesn't create any bins. when I checked the log I noticed that it estimates the number of contigs >= 2500bp to be zero (When I know I have large contigs)! the command I ran was:
runMetaBat.sh -d -v -t 12 BulkC.Allcontigs.sl.rn.fasta Bulk*C.mergedpaired.sorted.bam
it output 0 bins. I'll attach the log.
What can be causing the problem? could it be the format of my fasta?
the names follow this structure:
Filtrate1C.1 NODE_1_length_451967_cov_10.4532 <SEQUENCE>
Thanks in advance
Comments (8)
-
-
- attached BulkAll.contigs.sl.rn.sample.fasta
-
Hello Rob,
the fasta files follow the standard fasta structure, I believe a typo casued it to appear in one line
Filtrate1C.1 NODE_1_length_451967_cov_10.4532
SEQUENCE
I attached the first two sequences of the fasta file. The sequences are in one single line, so I don't know if that would affect anything.
Thanks
-
There is a space between the '>' and the rest of the identifier. That would leave no name and just a comment for each sequence. I believe that every sequence name needs to be unique and >0 length. If you take the space out, then it should work fine.
-
Hi Alinne, Did fixing the assembly fasta resolve your issue with MetaBAT? And what assembler did you use to generate this file in the first place? Thanks, Rob
-
Thanks a million Rob, I hope it solves the error
-
Hi Rob, It worked just fine. I’m glad the problem was due to such a minor issue. I use SPAdes assembler but the additional contig Identifier is added by me for some in house applications. I’ll be mindful of issues regarding spacing from now on. Again Thank you very much!
-
- changed status to resolved
- Log in to comment
That structure you quoted does not look like a fasta file. The sequence should not be inline. Please attach the first few lines of BulkC.Allcontigs.sl.rn.fasta so I can verify what it is.
A fasta file has at least two lines:
https://en.wikipedia.org/wiki/FASTA_format