Question on absolut abundances of bins

Issue #19 resolved
Former user created an issue

Hi, I´working with a metagenomic MOCK to design my pipeline. I have used Megahit to create the assembly after filtering reads (human contamination and phiX). I have used metabat to split my contigs into bins. So far so good. Then, i have assigned taxonomy to my bins (actually i have used Kaiju) and retrieved all the expected species in my bins (although the same species was found in two different bins).

1- Is there a way to fine tune so that the two species get assigned the same bin ?

2- Once you the species assigned to your bins how can you calculate the abundance of each species in the MOCK ?? Should i map the reads back to my bins and compute the number of assigned reads to each bin. For example if i map all the reads to my bins and find 1000 million reads mapped to it i can say that this species is present 1000 copies. The idea is to be able to compare samples after i have done all the binning.

Thanks for your advice.

Comments (6)

  1. david vilanova

    Forgot to mention what i have done so far

    1- Indexed my megahit contigs with bwa 2- mapped reads to my contigs and sorted bam files 3- Used jgi_summarize_bam_contig_depths to compute depth

    I meant relative abundance of each species not absolut.

    thanks

  2. david vilanova

    I have improved binning by providing the pair.txt file and playing with p1 and p2 parameters, so question 1 is resolved. However i still dont know how to get relative abundance from the bins (i already have taxonomic assignation for the bins).

    thanks

  3. Don Kang

    An easy way is to use depth file already used for metabat. You can get the median depth of contigs in a bin.

  4. david vilanova

    Great, are you filtering unmapped reads when creating the depth file or it depends on the bam file you submit. For instance un my case my bam file has been filtered with the "F4" tag to remove unmapped reads.

  5. Rob Egan

    unmapped reads would never count as additional depth on a contig an in fact even mapped reads that align poorly (i.e with a low %ID) are excluded from the count because we do not want multiple similar species to in a metagenome to influence the metrics.

    --percentIdentity arg The minimum end-to-end % identity of qualifying reads (default: 97)

  6. Log in to comment