Quantification question

Issue #82 resolved

Sebastien created an issue 2017-06-13

Hi Eduardo, it's me again,

I need your help to understand some results of miARma-seq.

I made a run with 1 mature miRNA, I just converted it to fastq:

@mmu-let-7a-5p_MIMAT0000521_Mus_musculus_let-7a-5p TGAGGTAGTAGGTTGTATAGTT + IIIIIIIIIIIIIIIIIIIIII

My problem is that the quantification with featurecounts give me 2 readcounts for this miRNA.

I have those lines in my GFF:

grep "let-7a-5p" /data/Genomes/mirbase/genomes/mmu.gff3 chr13 . miRNA 48538239 48538260 . - . ID=MIMAT0000521;Alias=MIMAT0000521;Name=mmu-let-7a-5p;Derives_from=MI0000556 chr9 . miRNA 41536732 41536753 . + . ID=MIMAT0000521_1;Alias=MIMAT0000521;Name=mmu-let-7a-5p;Derives_from=MI0000557

My bowtie command line is:

bowtie --sam -p 10 -m 5 -a --best --strata /data/Genomes/mm10_GRCm38.68/bowtie1/mm10 /data/STAGE_snin/Results/miARma_un_read/fasta/let-7a-5p.fastq /data/STAGE_snin/Results/miARma_un_read//Bowtie1_results/let-7a-5p_nat_bw1.bam 2>> /data/STAGE_snin/Results/miARma_un_read//miARma_stat.47989.log

My bam file contains the following lines for this read:

samtools view Bowtie1_results/let-7a-5p_nat_bw1.bam mmu-let-7a-5p_MIMAT0000521_Mus_musculus_let-7a-5p 0 chr9 41536732 255 22M * 0 0 TGAGGTAGTAGGTTGTATAGTT IIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:22 NM:i:0 mmu-let-7a-5p_MIMAT0000521_Mus_musculus_let-7a-5p 16 chr13 48538239 255 22M * 0 0 AACTATACAACCTACTACCTCA IIIIIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:22 NM:i:0

The featurecounts command line is:

featureCounts -s 1 -t miRNA -g Name -M --fraction -T 10 -a /data/Genomes/mirbase/genomes/mmu.gff3 -o /data/STAGE_snin/Results/miARma_un_read//nat_bw1_readcount_results/let-7a-5p_nat_bw1.tab /data/STAGE_snin/Results/miARma_un_read//Bowtie1_results/let-7a-5p_nat_bw1.bam 2>> /data/STAGE_snin/Results/miARma_un_read//miARma_logfile.47989.log

My readcount file contains this line for this miRNA:

awk -F "\t" '{if($2>0) print}' Readcount_results/nat_bw1-ReadCount.tab mmu-let-7a-5p 2

So the sequence is mapped two times on the genomes, and it's count two times. But I only have one read, I think it will make a biais in the DE analysis. I would like count one read for mmu-let-7a-5p but 2 lines in the bam file. Could you advise me?

Thank you

Sebastien

Comments (9)

Eduardo Andres Leon
Hi Sebastian, What you are obtaining is the standard behaviour. Is quite strange for miRNAs but if you are working with exons shared by different transcripts, this is what you need. In fact let-7 miNAs are usually coded for more than one genes. You can play with the different options by feature counts. As I see you are using the -M option to take into account multi mapping reads (reads in different location: as you are seeing in you bam file). You can also use -C ...

Regards
- 2017-06-14T06:11:42+00:00
Sebastien reporter
Hi Eduardo, I'm checking the manual of featurecounts and I see this for the -C option:

If specified, the chimeric fragments (those fragments that have their two ends aligned to different chromosomes) will NOT be counted. This option should be used together with -p (or isPairedEnd in Rsubread featureCounts).

They advise the use of the -p option, but it's:

If specified, fragments (or templates) will be counted instead of reads. This option is only applicable for paired-end reads.

I'm using SE data so this options are not usable in my case I think.

Do you think using the -m 5 option to detect all expressed miRNAs is usefull? I see in your example files that you do not use the -m option with bowtie. The fact that one only read can map to more than one position and will be count more than one time can induce some biais isn't it?

Regards
- 2017-06-14T06:26:03+00:00
Eduardo Andres Leon
Yes, is for paired-end reads.

I prefer not to use the -m option in bowtie (aligners are very time consuming) and play with parameters in featurecounts (really fast). Did you try tu remove the -M option?. This should not take into account multimapping reads. Regarding the bias, any change induce a bias. If the miRNA is expressed you will never know if it comes form the chr9 or the chr13, so its up to you if you prefer to take into account both, none or just one
- 2017-06-14T06:34:02+00:00
Sebastien reporter
Hi Eduardo, I'll close this issue. I use --best --strata -m 1 for Bowtie and default parameters for featurecounts and it works fine ;)

Thanks
- 2017-08-01T07:15:46+00:00
Sebastien reporter
- changed status to resolved
- 2017-08-01T07:15:51+00:00
Eduardo Andres Leon
Hi, As you have mention --best --strata -m1 are the best options. In fact some of them are included in miARma by default. Other not because bowtie1 is used by miRdeep, by the known miRNA pipeline and by the mRNA pipeline, so only compatible parameters are included. But you can include all parameters you want using the bowtie1parameters value

Cheers
- 2017-08-01T07:25:39+00:00
Sebastien reporter
I found an article that can maybe interesting for other people: https://www.ncbi.nlm.nih.gov/pubmed/27284164 from Mark Ziemann et al. 2016. They compare alignement methods for miRNA and Bowtie 1 with the parameters --best --strata -m1 is the best option for bowtie1.
- 2017-08-01T07:33:28+00:00
Eduardo Andres Leon
Thank you!! As I'm updating miArma, I'll try to include this option by default just for miRNAs
- 2017-08-01T07:34:54+00:00
Sebastien reporter
I'm glad to help you :D I really like your tool!
- 2017-08-01T07:36:45+00:00
Log in to comment

Assignee: –

Type: bug

Priority: major

Status: resolved

Votes: 0

Watchers: 1