readcounts are not consistent with BAM.

Issue #56 resolved

Hoon Kim created an issue 2016-08-08

Hello,

I was able to run miArmaseq with mouse data, but I found that my readcount results seem not consistent with BAM, as an example is shown in the attached pdf. I wonder if you can explain me about why I have this inconsistent result. If you need additional information, please, let me know. Your comments are highly appreciated.

Thank you in advance,

Hoon

Comments (23)

Eduardo Andres Leon
Dear Hoon, Let me start by the most obvious, could you send me the ini file and both logs ? (miARma-log and miARma-stats?)

Regards
- 2016-08-09T06:20:51+00:00
Eduardo Andres Leon
- assigned issue to
  
  Eduardo Andres Leon
- 2016-08-09T09:24:40+00:00
Hoon Kim reporter
- attached Known_miRNAs_pipeline.ini
- 2016-08-09T14:02:54+00:00
Hoon Kim reporter
- attached miARma_logfile.3999.log
- 2016-08-09T14:03:08+00:00
Hoon Kim reporter
Thank you very much for your immediate reply. Attached above are the log and init files. If you need other files, feel free to let me know.

Hoon
- 2016-08-09T14:04:13+00:00

Eduardo Andres Leon

It's a pleasure for me to help you. Please I'll need (to be sure) the following files :

The stat file (miARma_stat)

Also:

summary_results_SH_miARma.xls

Finally

/scratch/bcb/hkim6/SH-mouse-miRNA/work/merged-fastq/miARmaSeq/Known_miRNAs/results_mmu.gff3/miRNA_primary_transcript/SH_readcount_results/S177_ATCGTT_nat_bw1.sorted.tab (or similar name)

and

/scratch/bcb/hkim6/SH-mouse-miRNA/work/merged-fastq/miARmaSeq/Known_miRNAs/results_mmu.gff3/miRNA_primary_transcript/SH_readcount_results/S177_ATCGTT_nat_bw1.sorted.tab.summary ?

There are several strange results in your log file and I guess it is related with the adapter. The number of aligned files are extremelly low

Thanks in advance

2016-08-09T15:04:55+00:00

Hoon Kim reporter
- attached summary_results_SH_miARma.xls
- 2016-08-09T15:54:51+00:00
Hoon Kim reporter
- attached miARma_stat.3999.log
- 2016-08-09T15:55:21+00:00
Hoon Kim reporter
- attached nat_bw1-ReadCount.tab
- 2016-08-09T15:56:00+00:00
Hoon Kim reporter
- attached S177_ATCGTT_nat_bw1.tab.summary
- attached S177_ATCGTT_nat_bw1.tab
- 2016-08-09T15:57:10+00:00
Hoon Kim reporter
Thank you so much for your help. I just attached the files you requested. And, you are correct; only a few of reads have been aligned, which was strange to me as well.

If you need other files/request, feel free to let me know.

Best,

Hoon
- 2016-08-09T15:59:35+00:00
Eduardo Andres Leon
Thank you. As I see there are only 2 possible errors. If you open the xls file, the 75% (in average) of the reads are mapped. Which is a "good" value. But then, from this 75%, the 0% of the mapped reads are correctly detected. According with the tab.summary file, the vast majority of reads has no feature. So I see only two possible errors:
1. Your reads are not following a stranded protocol (you have stranded=Yes in the ini file). So altoguh you see reads int he BAM files, the program can't find them in the correct strand.
2. The GTF your are using is not from the same build/organism/whatever than the index file (the index is from mm9)
So I'd start by changing the strand=no

Regards
- 2016-08-09T16:11:49+00:00
Hoon Kim reporter
- attached summary_results_SH_miARma.xls
- attached miARma_logfile.3999.log
- attached Known_miRNAs_pipeline.ini
I tried with strand=no, but the result seems not so different; only a few reads are aligned on the transcriptome.

Any help will be highly appreciated.

Hoon
- 2016-08-09T16:46:14+00:00
Eduardo Andres Leon
At least now you have reached a 4% (in average) of detected reads. Could you send me a bam file ?
- 2016-08-09T16:58:12+00:00
Hoon Kim reporter
Sure, I would like to send you a (box.com) line to the bam. Can you provide your email account that you are using?

thanks
- 2016-08-09T17:10:04+00:00
Hoon Kim reporter
FYI: I need your email account to invite you to the box link to BAM.

thanks,

Hoon
- 2016-08-09T17:15:30+00:00
Hoon Kim reporter
Thank you. I just sent you an invitation to the box folder..

Hoon
- 2016-08-10T14:55:07+00:00

Eduardo Andres Leon

Dear Hoon,

As I see everythin is working correclty. If I use strand=no, I obtain 20 reads for miR-let-7i. That is the exact number of reads that you see in the IGV.

I've create a ini file with the following information:

;General parameters
[General]
; type of analysis (miRNA, mRNA or circRNA)
type=miRNA
; Folder for miRNA reads
#read_dir=Examples/basic_examples/miRNAs/reads/
read_dir=.
; label for the analsysis
label=SH
; Folder where miARma has been instaled
#miARmaPath=.
miARmaPath=/Users/eandres/bin/miARma/
; Folder to store results
#output_dir=Examples/basic_examples/miRNAs/Known_miRNAs/results/
output_dir=results/
; organism used
organism=mouse
; Number of process to run at the same time
threads=10
; Whether the data is from a strand-specific assay (yes, no or reverse, yes by default) for featureCounts analysis
strand=no
stats_file=results//miARma_stat.12097.log
logfile=results//miARma_logfile.12097.log

[ReadCount]
database=mmu.gff
seqid=Name
featuretype=miRNA_primary_transcript

And then:

$>grep "let-7i" results/Readcount_results/nat_bw1-ReadCount.tab 
mmu-let-7i  20

I've uploaded all needed files for you to test it. By the way I recommend you to use mature miRNAs instead of pre-miRNAs. It is commented at the end of the ini file

Best

2016-08-10T17:32:13+00:00

Hoon Kim reporter
Thank yo so much for checking it. I will take a look at the script and files in details.

By the way, if I want to use all features in a given GFF, how should I indicate in the ini. file? In the init file shown below, featuretype=miRNA_primary_transcript, but I would like to include both miRNA and miRNA_primary_transcript.

Thank you in advance,

Hoon
- 2016-08-10T18:35:05+00:00
Eduardo Andres Leon
It was a pleasure. Regarding your other question, the short answer is : not possible. The software in charge of the reads summarizing is not able to mix different features ( and at the moment I don't know any software able to do it.) My recommendation is to use different ini files and copy the Readcount_results folder for each feature (to avoid overwritting issues)

Regards
- 2016-08-12T12:01:17+00:00
Eduardo Andres Leon
Dear Hoon, Should I close the issue ?

Best
- 2016-08-13T08:30:48+00:00
Hoon Kim reporter
Yes, please. Thanks a lot.

HK
- 2016-08-13T10:40:05+00:00
Eduardo Andres Leon
- changed status to resolved
You're welcome
- 2016-08-13T16:05:25+00:00
Log in to comment

Assignee: Eduardo Andres Leon

Type: bug

Priority: major

Status: resolved

Votes: 0

Watchers: 1