Binning with nanopore data

Issue #142 resolved
Former user created an issue

Hi,

thanks for your tool. I'm currently trying to bin metagenome contigs produced with nanopore data, which are rather bad quality. I used the jgi tool and like you said in other threads (like here), nanopore data did not work well with default options because of the identity threshold.

So I tried a couple of values for this option and ended up with 72% of total reads with percentIdentity = 80 and 65% reads with percentIdentity = 85. I objectively wouldn't be able to know which one is better. How do you think a too small percentIdentity value would negatively impact metabat2 results? Do you know a way or another tool which could be more suitable for this?

Regards, JS

Comments (4)

  1. Rob Egan

    So the issues as I see it is that you are attempting to bin genomes out of a mixed metagenome. Since different species can differ by just 5-10% identity, long reads with high error rates give little ability to resolve this, imo. Couple that with a poor assembly where the contigs are short, with many potentially shorter than the long reads themselves, there just isn’t a whole lot of reliable information in the abundance profile to work with. The default of 97% identity is recommended in order to resolve species and have a chance at resolving strains when using short reads in the recommended workflow. We have not yet started to support or recommend using MetaBAT with long reads, though you certainly can by lowering that %ID threshold. I personally would not lower it below 85% identity unless all you wanted was higher level binning at the class/order level.

  2. JS GOUNOT

    Thanks you for your answer, make sense. In this case, I wonder if keeping only reads which does not have any secondary alignment would resolve the issue in a way. There is still a risk that the read wrongly mapped but the likelihood should remain low due to reads length. What do you think?

  3. Rob Egan

    Secondary alignments are always ignored, and the current recommendation for short reads is to have the aligner randomly select from the pool of contigs that align equally well so that there is no spike or bias in the coverage across repetitive regions within the contigs/scaffolds of the assembly. For long reads, the minMapQual could be raised, but make sure you utilize an aligner that does that calculation properly... it gets trickier to account for long reads mapping to shorter contigs since you would only want 1 possible region of the read to map to 1 possible contig and that kind or logic is not something that MetaBAT’s jgi_summarize_bam_contig_depths tracks, and I do not believe any of the aligners do either (i.e. does it set the secondary alignment flag if the 5' side mapped to one contig and the 3' side mapped to a different contig (MetaBAT would work best if it did not get set), but does set the secondary alignment flag if (significantly) overlapping regions map to to different contigs). If we ever support long reads, all that would have to be part of the logic.

  4. Log in to comment