readcounts are not consistent with BAM.

Issue #56 resolved
Hoon Kim created an issue

Hello,

I was able to run miArmaseq with mouse data, but I found that my readcount results seem not consistent with BAM, as an example is shown in the attached pdf. I wonder if you can explain me about why I have this inconsistent result. If you need additional information, please, let me know. Your comments are highly appreciated.

Thank you in advance,

Hoon

Comments (23)

  1. Eduardo Andres Leon

    Dear Hoon, Let me start by the most obvious, could you send me the ini file and both logs ? (miARma-log and miARma-stats?)

    Regards

  2. Hoon Kim reporter

    Thank you very much for your immediate reply. Attached above are the log and init files. If you need other files, feel free to let me know.

    Hoon

  3. Eduardo Andres Leon

    It's a pleasure for me to help you. Please I'll need (to be sure) the following files :

    The stat file (miARma_stat)
    

    Also:

    summary_results_SH_miARma.xls
    

    Finally

    /scratch/bcb/hkim6/SH-mouse-miRNA/work/merged-fastq/miARmaSeq/Known_miRNAs/results_mmu.gff3/miRNA_primary_transcript/SH_readcount_results/S177_ATCGTT_nat_bw1.sorted.tab (or similar name)
    

    and

    /scratch/bcb/hkim6/SH-mouse-miRNA/work/merged-fastq/miARmaSeq/Known_miRNAs/results_mmu.gff3/miRNA_primary_transcript/SH_readcount_results/S177_ATCGTT_nat_bw1.sorted.tab.summary ?
    

    There are several strange results in your log file and I guess it is related with the adapter. The number of aligned files are extremelly low

    Thanks in advance

  4. Hoon Kim reporter

    Thank you so much for your help. I just attached the files you requested. And, you are correct; only a few of reads have been aligned, which was strange to me as well.

    If you need other files/request, feel free to let me know.

    Best,

    Hoon

  5. Eduardo Andres Leon

    Thank you. As I see there are only 2 possible errors. If you open the xls file, the 75% (in average) of the reads are mapped. Which is a "good" value. But then, from this 75%, the 0% of the mapped reads are correctly detected. According with the tab.summary file, the vast majority of reads has no feature. So I see only two possible errors:

    1. Your reads are not following a stranded protocol (you have stranded=Yes in the ini file). So altoguh you see reads int he BAM files, the program can't find them in the correct strand.

    2. The GTF your are using is not from the same build/organism/whatever than the index file (the index is from mm9)

    So I'd start by changing the strand=no

    Regards

  6. Eduardo Andres Leon

    At least now you have reached a 4% (in average) of detected reads. Could you send me a bam file ?

  7. Hoon Kim reporter

    Sure, I would like to send you a (box.com) line to the bam. Can you provide your email account that you are using?

    thanks

  8. Eduardo Andres Leon

    Dear Hoon,

    As I see everythin is working correclty. If I use strand=no, I obtain 20 reads for miR-let-7i. That is the exact number of reads that you see in the IGV.

    I've create a ini file with the following information:

    ;General parameters
    [General]
    ; type of analysis (miRNA, mRNA or circRNA)
    type=miRNA
    ; Folder for miRNA reads
    #read_dir=Examples/basic_examples/miRNAs/reads/
    read_dir=.
    ; label for the analsysis
    label=SH
    ; Folder where miARma has been instaled
    #miARmaPath=.
    miARmaPath=/Users/eandres/bin/miARma/
    ; Folder to store results
    #output_dir=Examples/basic_examples/miRNAs/Known_miRNAs/results/
    output_dir=results/
    ; organism used
    organism=mouse
    ; Number of process to run at the same time
    threads=10
    ; Whether the data is from a strand-specific assay (yes, no or reverse, yes by default) for featureCounts analysis
    strand=no
    stats_file=results//miARma_stat.12097.log
    logfile=results//miARma_logfile.12097.log
    
    [ReadCount]
    database=mmu.gff
    seqid=Name
    featuretype=miRNA_primary_transcript
    

    And then:

    $>grep "let-7i" results/Readcount_results/nat_bw1-ReadCount.tab 
    mmu-let-7i  20
    

    I've uploaded all needed files for you to test it. By the way I recommend you to use mature miRNAs instead of pre-miRNAs. It is commented at the end of the ini file

    Best

  9. Hoon Kim reporter

    Thank yo so much for checking it. I will take a look at the script and files in details.

    By the way, if I want to use all features in a given GFF, how should I indicate in the ini. file? In the init file shown below, featuretype=miRNA_primary_transcript, but I would like to include both miRNA and miRNA_primary_transcript.

    Thank you in advance,

    Hoon

  10. Eduardo Andres Leon

    It was a pleasure. Regarding your other question, the short answer is : not possible. The software in charge of the reads summarizing is not able to mix different features ( and at the moment I don't know any software able to do it.) My recommendation is to use different ini files and copy the Readcount_results folder for each feature (to avoid overwritting issues)

    Regards

  11. Log in to comment