test data :: no bin ?

Issue #109 resolved
Eric Deveaud created an issue

Hello,

after a fresh compile of MetaBAT from 2.15 tagged archive : https://bitbucket.org/berkeleylab/metabat/get/v2.15.tar.gz

compiled on centos-8.2 using

gcc/9.2.0
htslib/1.10
boost/1.72.0
zlib/1.2.11

when I run with tests data here is the output I have:

[gensoft@01956cf442ce 2.15]$ runMetaBat.sh -v test/contigs.fa test/contigs-1000.fastq.bam
Executing: 'jgi_summarize_bam_contig_depths --outputDepth contigs.fa.depth.txt --percentIdentity 97 --minContigLength 1000 --minContigDepth 1.0 --referenceFasta test/contigs.fa test/contigs-1000.fastq.bam' at Wed Oct 28 18:41:14 UTC 2020
Output depth matrix to contigs.fa.depth.txt
Minimum percent identity for a mapped read: 0.97
minContigLength: 1000
minContigDepth: 1
Reference fasta file test/contigs.fa
jgi_summarize_bam_contig_depths 2.15 2020-10-28T16:30:18
Output matrix to contigs.fa.depth.txt
Reading reference fasta file: test/contigs.fa
... 1 sequences
0: Opening bam: test/contigs-1000.fastq.bam
Processing bam files
Thread 0 finished: contigs-1000.fastq.bam with 1000 reads and 1000 readsWellMapped
Creating depth matrix file: contigs.fa.depth.txt
Closing most bam files
Closing last bam file
Finished
Finished jgi_summarize_bam_contig_depths at Wed Oct 28 18:41:14 UTC 2020
Creating depth file for metabat at Wed Oct 28 18:41:14 UTC 2020
Executing: 'metabat2 -v --inFile test/contigs.fa --outFile contigs.fa.metabat-bins-v-20201028_184114/bin --abdFile contigs.fa.depth.txt' at Wed Oct 28 18:41:14 UTC 2020
MetaBAT 2 (2.15) using minContig 2500, minCV 1.0, minCVSum 1.0, maxP 95%, minS 60, maxEdges 200 and minClsSize 200000. with random seed=1603910474
[00:00:00] Executing with 56 threads
[00:00:00] Parsing abundance file
[00:00:00] Parsing assembly file
[00:00:00] Number of large contigs >= 2500 are 1.
[00:00:00] Reading abundance file
[00:00:00] Finished reading 1 contigs and 1 coverages from contigs.fa.depth.txt
[00:00:00] Number of target contigs: 1 of large (>= 2500) and 0 of small ones (>=1000 & <2500).
[00:00:00] Start TNF calculation. nobs = 1
[00:00:00] Finished TNF calculation.
[00:00:00] Finished Preparing TNF Graph Building [pTNF = 69.60]
[00:00:00] Building TNF Graph 4400.0% (44 of 1), ETA 0:00:00 [28.1Gb / 251.5Gb] [00:00:00] Finished Building TNF Graph (0 edges) [28.1Gb / 251.5Gb]
No edges were formed by TNF.
[00:00:00] Rescuing singleton large contigs
[00:00:00] There are 0 bins already
[00:00:00] Outputting bins
[00:00:00] -nan% (0 bases) of large (>=2500) and 0.00% (0 bases) of small (<2500) contigs were binned.
0 bins (0 bases in total) formed.
[00:00:00] Finished
Finished metabat2 at Wed Oct 28 18:41:14 UTC 2020

is this expected and or normal ?

regards

Eric

Comments (4)

  1. Rob Egan

    If you look at the output:

    Number of large contigs >= 2500 are 1

    So, only 1 contig was >2500 bases in length, and I presume it is less than the 200kb default minimum bin size, so there are no bins that metabat can generate with any confidence.

    If your assembly is too small or too fragmented there is not much hope of generating reliable bins out of it.

  2. Eric Deveaud reporter

    inded BUT assembly is not mine, it is YOUR’s 😉 it is the one provided in the test folder

    I was exepecting at least some results to look at and play with accessory perl scripts
    NB I’m not biologist, I just install stuff

    regards

    Eric

  3. Rob Egan

    Ah yes, sorry I missed that. The test passed, though, right? We are really just looking for an exit return code of 0.

    we use that data set with several different options and the default opts do result in 0 bins – a valid result. That packaged data set is not intended to be a full sweep of the functionality or accuracy which is much more manually involved, and the data sets for those tests are not suited to be embedded in the git repo.

    -Rob

  4. Log in to comment