- edited description
Too many bins with test data
I run metabat2 with the assembly and the 2 mock communities provided, and i get 121 bins instead of the 28 indicated in the web page. Any advice?
This is the output on the screen.
Executing: 'jgi_summarize_bam_contig_depths --outputDepth assembly.fa.depth.txt --percentIdentity 97 --minContigLength 1000 --minContigDepth 1.0 --referenceFasta assembly.fa library1.sorted.bam library2.sorted.bam' at vie dic 27 11:57:15 -03 2019
Output depth matrix to assembly.fa.depth.txt
Minimum percent identity for a mapped read: 0.97
minContigLength: 1000
minContigDepth: 1
Reference fasta file assembly.fa
jgi_summarize_bam_contig_depths v2.14-7-g2b4b398 2019-12-27T11:49:07
Output matrix to assembly.fa.depth.txt
Reading reference fasta file: assembly.fa
... 1893 sequences
0: Opening bam: library1.sorted.bam1: Opening bam: library2.sorted.bam
Processing bam files
Thread 0 finished: library1.sorted.bam with 82747556 reads and 82230198 readsWellMapped
Thread 1 finished: library2.sorted.bam with 175361525 reads and 165063079 readsWellMapped
Creating depth matrix file: assembly.fa.depth.txt
Closing most bam files
Closing last bam file
Finished
Finished jgi_summarize_bam_contig_depths at vie dic 27 12:01:46 -03 2019
Creating depth file for metabat at vie dic 27 12:01:46 -03 2019
Executing: 'metabat2 --inFile assembly.fa --outFile assembly.fa.metabat-bins-20191227_120146/bin --abdFile assembly.fa.depth.txt' at vie dic 27 12:01:46 -03 2019
MetaBAT 2 (v2.14-7-g2b4b398) using minContig 2500, minCV 1.0, minCVSum 1.0, maxP 95%, minS 60, maxEdges 200 and minClsSize 200000.
121 bins (85393251 bases in total) formed.
Finished metabat2 at vie dic 27 12:01:47 -03 2019
Comments (7)
-
reporter -
Hi Victor,
Thanks for your post.
Upon investigation I see that there is a large discrepancy between the bins generated from the master branch and our documented postings in the wiki from version 2.10.2 and I’m investigating it now. I believe metabat2 started misbehaving in just the last commit on the master branch (v2.14-7-g2b4b398) and I am testing and validating the fix presently.
I’ll hopefully have a working version by tomorrow.
Best,
Rob
-
Hello Victor,
Thank you again for brining this to my attention.
There were two bugs in the code which resulted in a significant and a minor performance degradation of the resulting bins, at least with respect to the “Best Binning Practices” wiki page for metabat2 (https://bitbucket.org/berkeleylab/metabat/wiki/Best Binning Practices) . These are now fixed in v2.15 and I encourage you to try again with this new version.
If you still see a problem please indicate which “web page” you are referring to and I can see if there are any further discrepancies.
Best,
Rob
-
- marked as major
-
reporter Hello Rob,
thanks a lot for answering my question. I installed the new version v2.15, and I ran metabat2 again with the files downloaded from (http://portal.nersc.gov/dna/RD/Metagenome_RD/MetaBAT/Software/Mockup/). I was following the “Example with real data” section from the web page https://bitbucket.org/berkeleylab/metabat/src/master/.
This time I obtained 38 bins instead of the 28 indicated in the web page (https://bitbucket.org/berkeleylab/metabat/src/master/ line “MetaBAT forms about 28 bins. In this example ….” ). So probably now is working fine. Later i will try my own samples and see if I also have less bins.
Please let me know if you need further information.
All the best,
Victor
-
Hi Victor,
Yes, the discrepancy in the Mockup dataset originates between v2.11.2 and v2.11.3 where a bug was fixed which, for that dataset, now results in more precise bins when species and strains are closely related and the number of samples used is <=2. Unfortunately there is almost always a tradeoff between sensitivity and specificity so fixing that bug resulted in more bins, some of which are less complete.
Best,
Rob
-
- changed status to resolved
- Log in to comment