Missing Contigs from metabat2 output

Issue #68 resolved
Former user created an issue

Hi, I am using metabat2 and i have the --unbinned parameter on, but my unbinned contigs output files is missing the majority of input contigs (~93% of them). When I run metabat1, all unbinned contigs are output to the unbinned file. I was wondering what is happening in metabat2 that results in "missing contigs" from the unbinned output file. Might it be that the contigs the don't meet the -m threshold are just discarded?. I am curious because I would like to look at some stats on the unbinned contigs from metabat2. This is the metabat2 command I am using "metabat2 -i megahit_pooled_out/final.contigs.fa -o ./metabat2_bins -t 16 --unbinned --seed 100 mapped_assembly/*sorted.bam" Thanks

Comments (4)

  1. Rob Egan

    Only those contigs that pass the initial filter of size are eligible for binning so the filtered sequences will not make it into the unbinned file. The -m parameter does affect the contents of the unbinned file, as well as all the other cluster bins so setting it lower may not have your desired results.

    I am changing this ticket to a proposal request, and in a future version we will make an additional file that contains all the sequences that are filtered out.

  2. Rob Egan

    Looking at this more closely, there is also a silent filtering step that weeds out those contigs with low abundance from the depths file.

    If any of the samples have <minCV depth then that sample is ignored and if the sum or counted samples is <minSVSum then that contig is not considered for binning and filtered out. The defaults for both parameters are 1.0 and may need to be re-evaluated as to their impact on the binning results, especially with respect to low abundance contigs from the combined assembly of a widely sampled data set.

    Setting --minCV 0 and --minCVSum 0 results in no filtered contigs because of low depth.

    My recommendation without testing would be to set --minCV 0.2 or even 0.0, and leave as default --minCVSum 1, to weed out contigs that are likely erroneous. But ultimately the minCVSum may need to be independent of the minCV threshold for testing the abundance of a single contig in a single sample.

  3. Log in to comment