an error from running with mouse databases

Issue #54 resolved
Hoon Kim created an issue

Dear Eduardo,

After downloading 'ftp://mirbase.org/pub/mirbase/CURRENT/genomes/mmu.gff3' and 'ftp://ftp.ccb.jhu.edu/pub/data/bowtie_indexes/mm9.ebwt.zip', I tried to generate 'ReadCount results'. But, the output is different from the one I had from your Example data: S177_ATCGTT S178_GACGTT S179_AGAGTT S180_GTTCTT S83_ATGTTT S84_GAGTTT S85_AGCTTT S86_GTATTT S89_ACATTT S90_GGTGTT S91_AATGTT S92_GCGGTT 5 5 23 6 41 12 71 69 7 44 15 61

It looks like my mouse mmu.gff3 may not be compatible with the mirrma program. I wonder if you can provide your comments on how to solve this program.

thank you in advance,

Hoon

Comments (7)

  1. Eduardo Andres Leon

    Dear Hoon, All kind of GFF are compatible with miARma as GFTT files, are standard annotation files. I wonder which parameters did you use. For this mmu.gff3, you will need:

     [ReadCount]
     database=path_to_mmu.gff3
     ;GFF attribute to be used as feature ID (default: gene_id) for featureCounts analysis
     seqid=Name
     ;Feature type (3rd column in GFF file) to be used, all features of other type are ignored (default:exon) for featureCounts analysis
     featuretype=miRNA
    

    greetings

  2. Hoon Kim reporter

    Thank you very much for your comments. Now it works.

    By the way, I think your program can also measure # of reads on other small RNAs(<100 bps) if the corresponding gtf is provided. Is it correct?

    thank you,

    Hoon

  3. Hoon Kim reporter

    Dear Eduardo,

    I have another question. After creating an annotation GTF consisting of all small mouse transcripts (<=100 bps), I ran miarma.

    a part of 'summary_results.xls' is:

    Alignment [/scratch/bcb/hkim6/SH-mouse-miRNA/work/merged-fastq/miARmaSeq/Known_miRNAs/results_mmu.NCBIM37.67_100bp.gtf//Bowtie1_results/]
    Filename    Processed Reads Aligned reads   Failed to align
    S177_ATCGTT.fastq   1189389 893362 (75.11%) 296027 (24.89%)
    S178_GACGTT.fastq   768298  580088 (75.50%) 188210 (24.50%)
    S179_AGAGTT.fastq   1183877 866511 (73.19%) 317366 (26.81%)
    S180_GTTCTT.fastq   1378839 1039826 (75.41%)    339013 (24.59%)
    S83_ATGTTT.fastq    2264620 1633403 (72.13%)    631217 (27.87%)
    S84_GAGTTT.fastq    1422850 1108686 (77.92%)    314164 (22.08%)
    S85_AGCTTT.fastq    2364015 1656693 (70.08%)    707322 (29.92%)
    S86_GTATTT.fastq    2570765 1656320 (64.43%)    914445 (35.57%)
    S89_ACATTT.fastq    1338242 1084348 (81.03%)    253894 (18.97%)
    S90_GGTGTT.fastq    2634816 1691761 (64.21%)    943055 (35.79%)
    S91_AATGTT.fastq    1121329 828391 (73.88%) 292938 (26.12%)
    S92_GCGGTT.fastq    2574415 1778573 (69.09%)    795842 (30.91%)
    
    ReadCount [/scratch/bcb/hkim6/SH-mouse-miRNA/work/merged-fastq/miARmaSeq/Known_miRNAs/results_mmu.NCBIM37.67_100bp.gtf//Readcount_results/]
    Filename    Processed Reads Assigned reads  Strand  Number of identified entities
    S177_ATCGTT_nat_bw1 1189389 57491 (4.8%)    no  399
    S178_GACGTT_nat_bw1 768298  21267 (2.8%)    no  347
    S179_AGAGTT_nat_bw1 1183877 62632 (5.3%)    no  415
    S180_GTTCTT_nat_bw1 1378839 43788 (3.2%)    no  411
    S83_ATGTTT_nat_bw1  2264620 135825 (6.0%)   no  453
    S84_GAGTTT_nat_bw1  1422850 43822 (3.1%)    no  387
    S85_AGCTTT_nat_bw1  2364015 159310 (6.7%)   no  440
    S86_GTATTT_nat_bw1  2570765 139655 (5.4%)   no  450
    S89_ACATTT_nat_bw1  1338242 39425 (2.9%)    no  378
    S90_GGTGTT_nat_bw1  2634816 161102 (6.1%)   no  450
    S91_AATGTT_nat_bw1  1121329 53323 (4.8%)    no  393
    S92_GCGGTT_nat_bw1  2574415 176444 (6.9%)   no  472
    

    Overall, only a small fraction (3~4%) of the reads were assigned to the transcript GTF, and I think these alignment fractions are too low. I wonder if you, as an expert in analysis of miRNA sequencing, can provide your comments on what would be a potential problem causing such low alignment fractions.

    Thank you in advance,

    Hoon

  4. Eduardo Andres Leon

    Dear Hoon, I see that you are already an expert on miARma ;). I hope you find it useful and easy-to-use.

    Regarding your question, you only have to think if your results make sense. Although you have change the gtf file that doesn't mean that you have to find anything, what I mean is that in some cases the wet part remove bigger fragments. For example in miRNAs, the protocol needs fragmentation and sonication to remove big fragments. Thus for lnc-RNAs is better to use a RNASeq protocol rather to a miRNASeq protocol.

    As a test, you can change the strand (now you have strand=no, try strand=yes), to make sure it is not a quantification problem Another way to check it (visually), is to use the IGV software using a bam file from Bowtie1_results folder and your gtf file. In such a way you can see the number of reads in each small mouse transcript

    Those are my suggestions

    Regards

  5. Log in to comment