Quantification question

Issue #82 resolved
Sebastien created an issue

Hi Eduardo, it's me again,

I need your help to understand some results of miARma-seq.

I made a run with 1 mature miRNA, I just converted it to fastq:

@mmu-let-7a-5p_MIMAT0000521_Mus_musculus_let-7a-5p TGAGGTAGTAGGTTGTATAGTT + IIIIIIIIIIIIIIIIIIIIII

My problem is that the quantification with featurecounts give me 2 readcounts for this miRNA.

I have those lines in my GFF:

grep "let-7a-5p" /data/Genomes/mirbase/genomes/mmu.gff3 chr13 . miRNA 48538239 48538260 . - . ID=MIMAT0000521;Alias=MIMAT0000521;Name=mmu-let-7a-5p;Derives_from=MI0000556 chr9 . miRNA 41536732 41536753 . + . ID=MIMAT0000521_1;Alias=MIMAT0000521;Name=mmu-let-7a-5p;Derives_from=MI0000557

My bowtie command line is:

bowtie --sam -p 10 -m 5 -a --best --strata /data/Genomes/mm10_GRCm38.68/bowtie1/mm10 /data/STAGE_snin/Results/miARma_un_read/fasta/let-7a-5p.fastq /data/STAGE_snin/Results/miARma_un_read//Bowtie1_results/let-7a-5p_nat_bw1.bam 2>> /data/STAGE_snin/Results/miARma_un_read//miARma_stat.47989.log

My bam file contains the following lines for this read:

samtools view Bowtie1_results/let-7a-5p_nat_bw1.bam mmu-let-7a-5p_MIMAT0000521_Mus_musculus_let-7a-5p 0 chr9 41536732 255 22M * 0 0 TGAGGTAGTAGGTTGTATAGTT IIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:22 NM:i:0 mmu-let-7a-5p_MIMAT0000521_Mus_musculus_let-7a-5p 16 chr13 48538239 255 22M * 0 0 AACTATACAACCTACTACCTCA IIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:22 NM:i:0

The featurecounts command line is:

featureCounts -s 1 -t miRNA -g Name -M --fraction -T 10 -a /data/Genomes/mirbase/genomes/mmu.gff3 -o /data/STAGE_snin/Results/miARma_un_read//nat_bw1_readcount_results/let-7a-5p_nat_bw1.tab /data/STAGE_snin/Results/miARma_un_read//Bowtie1_results/let-7a-5p_nat_bw1.bam 2>> /data/STAGE_snin/Results/miARma_un_read//miARma_logfile.47989.log

My readcount file contains this line for this miRNA:

awk -F "\t" '{if($2>0) print}' Readcount_results/nat_bw1-ReadCount.tab mmu-let-7a-5p 2

So the sequence is mapped two times on the genomes, and it's count two times. But I only have one read, I think it will make a biais in the DE analysis. I would like count one read for mmu-let-7a-5p but 2 lines in the bam file. Could you advise me?

Thank you

Sebastien

Comments (9)

  1. Eduardo Andres Leon

    Hi Sebastian, What you are obtaining is the standard behaviour. Is quite strange for miRNAs but if you are working with exons shared by different transcripts, this is what you need. In fact let-7 miNAs are usually coded for more than one genes. You can play with the different options by feature counts. As I see you are using the -M option to take into account multi mapping reads (reads in different location: as you are seeing in you bam file). You can also use -C ...

    Regards

  2. Sebastien reporter

    Hi Eduardo, I'm checking the manual of featurecounts and I see this for the -C option:

    If specified, the chimeric fragments (those fragments that have their two ends aligned to different chromosomes) will NOT be counted. This option should be used together with -p (or isPairedEnd in Rsubread featureCounts).

    They advise the use of the -p option, but it's:

    If specified, fragments (or templates) will be counted instead of reads. This option is only applicable for paired-end reads.

    I'm using SE data so this options are not usable in my case I think.

    Do you think using the -m 5 option to detect all expressed miRNAs is usefull? I see in your example files that you do not use the -m option with bowtie. The fact that one only read can map to more than one position and will be count more than one time can induce some biais isn't it?

    Regards

  3. Eduardo Andres Leon

    Yes, is for paired-end reads.

    I prefer not to use the -m option in bowtie (aligners are very time consuming) and play with parameters in featurecounts (really fast). Did you try tu remove the -M option?. This should not take into account multimapping reads. Regarding the bias, any change induce a bias. If the miRNA is expressed you will never know if it comes form the chr9 or the chr13, so its up to you if you prefer to take into account both, none or just one

  4. Sebastien reporter

    Hi Eduardo, I'll close this issue. I use --best --strata -m 1 for Bowtie and default parameters for featurecounts and it works fine ;)

    Thanks

  5. Eduardo Andres Leon

    Hi, As you have mention --best --strata -m1 are the best options. In fact some of them are included in miARma by default. Other not because bowtie1 is used by miRdeep, by the known miRNA pipeline and by the mRNA pipeline, so only compatible parameters are included. But you can include all parameters you want using the bowtie1parameters value

    Cheers

  6. Eduardo Andres Leon

    Thank you!! As I'm updating miArma, I'll try to include this option by default just for miRNAs

  7. Log in to comment