readcounts are not consistent with BAM.
Hello,
I was able to run miArmaseq with mouse data, but I found that my readcount results seem not consistent with BAM, as an example is shown in the attached pdf. I wonder if you can explain me about why I have this inconsistent result. If you need additional information, please, let me know. Your comments are highly appreciated.
Thank you in advance,
Hoon
Comments (23)
-
-
-
assigned issue to
-
assigned issue to
-
reporter - attached Known_miRNAs_pipeline.ini
-
reporter - attached miARma_logfile.3999.log
-
reporter Thank you very much for your immediate reply. Attached above are the log and init files. If you need other files, feel free to let me know.
Hoon
-
It's a pleasure for me to help you. Please I'll need (to be sure) the following files :
The stat file (miARma_stat)
Also:
summary_results_SH_miARma.xls
Finally
/scratch/bcb/hkim6/SH-mouse-miRNA/work/merged-fastq/miARmaSeq/Known_miRNAs/results_mmu.gff3/miRNA_primary_transcript/SH_readcount_results/S177_ATCGTT_nat_bw1.sorted.tab (or similar name)
and
/scratch/bcb/hkim6/SH-mouse-miRNA/work/merged-fastq/miARmaSeq/Known_miRNAs/results_mmu.gff3/miRNA_primary_transcript/SH_readcount_results/S177_ATCGTT_nat_bw1.sorted.tab.summary ?
There are several strange results in your log file and I guess it is related with the adapter. The number of aligned files are extremelly low
Thanks in advance
-
reporter - attached summary_results_SH_miARma.xls
-
reporter - attached miARma_stat.3999.log
-
reporter - attached nat_bw1-ReadCount.tab
-
reporter - attached S177_ATCGTT_nat_bw1.tab.summary
- attached S177_ATCGTT_nat_bw1.tab
-
reporter Thank you so much for your help. I just attached the files you requested. And, you are correct; only a few of reads have been aligned, which was strange to me as well.
If you need other files/request, feel free to let me know.
Best,
Hoon
-
Thank you. As I see there are only 2 possible errors. If you open the xls file, the 75% (in average) of the reads are mapped. Which is a "good" value. But then, from this 75%, the 0% of the mapped reads are correctly detected. According with the tab.summary file, the vast majority of reads has no feature. So I see only two possible errors:
-
Your reads are not following a stranded protocol (you have stranded=Yes in the ini file). So altoguh you see reads int he BAM files, the program can't find them in the correct strand.
-
The GTF your are using is not from the same build/organism/whatever than the index file (the index is from mm9)
So I'd start by changing the strand=no
Regards
-
-
reporter - attached summary_results_SH_miARma.xls
- attached miARma_logfile.3999.log
- attached Known_miRNAs_pipeline.ini
I tried with strand=no, but the result seems not so different; only a few reads are aligned on the transcriptome.
Any help will be highly appreciated.
Hoon
-
At least now you have reached a 4% (in average) of detected reads. Could you send me a bam file ?
-
reporter Sure, I would like to send you a (box.com) line to the bam. Can you provide your email account that you are using?
thanks
-
reporter FYI: I need your email account to invite you to the box link to BAM.
thanks,
Hoon
-
reporter Thank you. I just sent you an invitation to the box folder..
Hoon
-
Dear Hoon,
As I see everythin is working correclty. If I use strand=no, I obtain 20 reads for miR-let-7i. That is the exact number of reads that you see in the IGV.
I've create a ini file with the following information:
;General parameters [General] ; type of analysis (miRNA, mRNA or circRNA) type=miRNA ; Folder for miRNA reads #read_dir=Examples/basic_examples/miRNAs/reads/ read_dir=. ; label for the analsysis label=SH ; Folder where miARma has been instaled #miARmaPath=. miARmaPath=/Users/eandres/bin/miARma/ ; Folder to store results #output_dir=Examples/basic_examples/miRNAs/Known_miRNAs/results/ output_dir=results/ ; organism used organism=mouse ; Number of process to run at the same time threads=10 ; Whether the data is from a strand-specific assay (yes, no or reverse, yes by default) for featureCounts analysis strand=no stats_file=results//miARma_stat.12097.log logfile=results//miARma_logfile.12097.log [ReadCount] database=mmu.gff seqid=Name featuretype=miRNA_primary_transcript
And then:
$>grep "let-7i" results/Readcount_results/nat_bw1-ReadCount.tab mmu-let-7i 20
I've uploaded all needed files for you to test it. By the way I recommend you to use mature miRNAs instead of pre-miRNAs. It is commented at the end of the ini file
Best
-
reporter Thank yo so much for checking it. I will take a look at the script and files in details.
By the way, if I want to use all features in a given GFF, how should I indicate in the ini. file? In the init file shown below, featuretype=miRNA_primary_transcript, but I would like to include both miRNA and miRNA_primary_transcript.
Thank you in advance,
Hoon
-
It was a pleasure. Regarding your other question, the short answer is : not possible. The software in charge of the reads summarizing is not able to mix different features ( and at the moment I don't know any software able to do it.) My recommendation is to use different ini files and copy the Readcount_results folder for each feature (to avoid overwritting issues)
Regards
-
Dear Hoon, Should I close the issue ?
Best
-
reporter Yes, please. Thanks a lot.
HK
-
- changed status to resolved
You're welcome
- Log in to comment
Dear Hoon, Let me start by the most obvious, could you send me the ini file and both logs ? (miARma-log and miARma-stats?)
Regards