reads mapped only one time?

Issue #49 resolved
Annika Seidel created an issue

I have a large dataset for which jgi_summarize_bam_contig_depths creates a lot of 0 rows in the depth file.

As a toy example I did the following:

  1. Created two samples with coverage 5 using bbmaps random_reads on a complete e.coli genome --> both read files were pretty much identical, only sequence headers differ
  2. Assembled both read files into contigs --> now have two very similar sets of contigs
  3. Used bbmap to create sam files --> as could be expected, the reads from both samples mapped to both sets of contigs followed by samtools view and sort to created sorted bam files
  4. Called jgi_summarize_bam_contig_depths using the two bam files as input

The result was, that pretty much all contigs from sample 1 had only 0 entries in the matrix, while all contigs from sample 0 had non-0 entries.

Is this an issue based on the fact that each read maps equally well to contigs from sample 0 and 1 and is therefore only used for the depth calculation of one of them?

Thanks, Annika

Comments (4)

  1. Rob Egan

    Reads are only counted once for coverage purposes in jgi_summarize_bam_contig_depths. If the secondary flag is set in the bam file, then the coverage for that secondary read alignment is ignored. Check your settings in bbmap and you will likely find that the primary mapping is to sample 0 and the secondary mappings are to sample 1.

  2. Annika Seidel reporter

    Thanks. So basically, if two contigs cover the same region of a genome, only one of them will be considered in the subsequently process , right?

  3. Rob Egan

    That would depend on your mapper. If you are using bbmap to map the reads to your combined assembly ambiguous=random is what you would want. However in general it is not advised to have a high amount of duplication in your assembly.

  4. Log in to comment