how to run miARmaSeq with bam file

Issue #65 resolved
Kyra Yang created an issue

Hello,

I am trying miARma, but the data I get is sorted bam file, could I jump to the readcount part directly? Indeed, when I try to set the file direction to the file that contain bam files, the software always said bam files not found.

Kyra

Comments (9)

  1. Eduardo Andres Leon

    Dear Kyra, Thank you for using miARma. If you start from a bam file, you can use miARma as well. In order to give you the detailed steps, could you send me (or paste) the ini file you are using ?

    Kind regards

  2. Kyra Yang reporter

    Hi Eduardo, here is what I plan to do.

    ;General parameters
    [General]
    ; type of analysis (miRNA, mRNA or circRNA)
    type=miRNA
    ; Folder for miRNA reads (only *.bam files inside)
    read_dir=bam/
    ; label for the analsysis
    label=Hypoxia
    ; Folder where miARma has been instaled
    miARmaPath=.
    ; Folder to store results
    output_dir=results/
    ; organism used
    organism=mouse
    ; Number of process to run at the same time
    threads=4
    ; Whether the data is from a strand-specific assay (yes, no or reverse, yes by default) for featureCounts analysis
    strand=no
    stats_file=results//miARma_stat.43178.log
    logfile=results//miARma_logfile.43178.log
    
    [ReadCount]
    #GFF file used to calculate the number of reads in featureCounts analysis
    database=Mus_musculus_miRNA.gtf
    ;GFF attribute to be used as feature ID (default: gene_id) for featureCounts analysis
    seqid=transcript_id
    ;Feature type (3rd column in GFF file) to be used, all features of other type are ignored (default:exon) for featureCounts analysis
    featuretype=
    
    [DEAnalysis]
    ; Complete path of the target file.
    targetfile=targets.txt
    ; Path of the contrast file.
    contrastfile=contrast.txt
    #This value refers to filter processing in the reads (Should be "yes" or "no").
    filter=yes
    ;Specific software to perform the Differential Expression Analysis (Allowed values: edger, noiseq or edger-noiseq)
    desoft=EdgeR
    ;Provide a file with normalized reads
    cpm=yes
    ;Provide a file with RPKM values
    rpkm=yes
    

    Indeed I also fail to use the aligner, always said the reference could not be open, but the reference have no problem in other programe, such as tophat2.

  3. Eduardo Andres Leon

    Hi Kyra, The way miARma works is by creating folder to store results inside. As you have your own bam files, you have to create a folder where miARma can search inside. So inside results/ you have to create a folder called Bowtie2_results. (you can creare a link among bam/ and results/Bowtie2_results/ . Then in order to avoid future problems, I encourage you to rename your bam files including _cut_bw2 at the end. So if you have a file called WTRep1.bam, this should be renamed to WTRep1_cut_bw2.bam

    Once this is done, you can use the ReadCount module. Remember that both folder must exists: bam and results/Bowtie2_results

    Regarding the index. Tophat2 uses bowtie1 and bowtie2 so you have to take care of what index are you using related with the aligner in tophat

    Let me know

    Eduardo

  4. Kyra Yang reporter

    Hi Eduardo

    I was able to do the readcount, but then fail in the DEAnaltsis

    [Thu Nov 10 21:42:39 2016] Readcount Analysis finished.
    [Thu Nov 10 21:42:39 2016] Starting a differential expression analysis using EdgeR software(s)
    Can't use string ("停止执行") as an ARRAY ref while "strict refs" in use at lib/CbBio/RNASeq/DEAnalysis.pm line 1061.
    

    The system language is Chinese. string ("停止执行") = string("stop execution")

    About the index, I wrote like this

    [Aligner]
    ; Aligner (Bowtie1, Bowtie2, BWA or miRDeep)
    aligner=Bowtie2
    ; Bowtie 2 index
    bowtie2index=/Users/labxi/Desktop/RNA seq/mm10Bowtie2Index/genome
    

    In the index file, include the genome.fa and genome.1.bt2, genome.2.bt2....

  5. Eduardo Andres Leon

    Hi Kyra, Regarding the DEAnalysis, I need some information: * perl version * R version * bioconductor version

    Then I'll need the miARma_stat.43178.log, miARma_logfile.43178.log, contrast.txt and targets.txt

    And about the bowtie2index, try to remove the space, unix machines don't like spaces in path : /Users/labxi/Desktop/RNAseq/mm10Bowtie2Index/genome

    Regards

  6. Kyra Yang reporter

    Hi Eduardo,

    I solve the DEAnalysis problem, is the system language problem. I just need to change the system language to English. Then another failure came out.

    [Fri Nov 11 19:17:57 2016] Starting a differential expression analysis using EdgeR software(s)
    Problem while running this R command:
    print(resultsfiles)
    
    Error:
    print(resultsfiles) : object 'resultsfiles' not found
    Execution halted
    

    The logfile is like this

    SEQCOUNT :: [Fri Nov 11 19:17:57 2016] Please check the folder: results//Readcount_results/.
    QC_EdgeR :: [Fri Nov 11 19:17:58 2016] Executing        source("./lib/CbBio/RNASeq/R-Scripts/QC_EdgeR.R")
            setwd("/Users/labxi/Desktop/RNA_seq/Software/cbbio-miarma/results/Readcount_results")
            resultsfiles<-QC_EdgeR(projectdir="/Users/labxi/Desktop/RNA_seq/Software/cbbio-miarma/results",dir="/Users/labxi/Desktop/RNA_seq/Software/cbbio-miarma/results/Readcount_results", file="cut_bw2-ReadCount.tab", targetfile="/Users/labxi/Desktop/RNA_seq/Software/cbbio-miarma/targets.txt", label="DKO_cut_bw2", filter="yes", cpmvalue=2, repthreshold=1, normethod="TMM")
    
    QC_EdgeR :: [Fri Nov 11 19:18:03 2016] The file /Users/labxi/Desktop/RNA_seq/Software/cbbio-miarma/results/DKO_cut_bw2_QC_Report.pdf has been generated 
    DE_EDGER :: [Fri Nov 11 19:18:03 2016] Executing        source("./lib/CbBio/RNASeq/R-Scripts/DE_EdgeR.R")
            setwd("/Users/labxi/Desktop/RNA_seq/Software/cbbio-miarma/results/Readcount_results")
            resultsfiles<-DE_EdgeR(projectdir="/Users/labxi/Desktop/RNA_seq/Software/cbbio-miarma/results/EdgeR_results",dir="/Users/labxi/Desktop/RNA_seq/Software/cbbio-miarma/results/Readcount_results", file="/Users/labxi/Desktop/RNA_seq/Software/cbbio-miarma/results/Readcount_results/cut_bw2-ReadCount.tab", targetfile="/Users/labxi/Desktop/RNA_seq/Software/cbbio-miarma/targets.txt", label="DKO_cut_bw2", contrastfile="/Users/labxi/Desktop/RNA_seq/Software/cbbio-miarma/contrast.txt", filter="yes", cpmvalue=2, repthreshold=1, normethod="TMM", replicates="yes", rpkm="yes", file_size="/Users/labxi/Desktop/RNA_seq/Software/cbbio-miarma/results/Readcount_results/cut_bw2-Size.tab", cpm="yes")
    

    Stat.log have nothing.

    The perl version is v5.18.2, R is 3.3.1 bioconductor is 3.3 targets.tex

    Filename    Name    Condition
    1-1 Control_1   WT
    1-2 Control_2   WT
    1-3 Control_3   WT
    2-1 DKO_1   Bf-Mf-lyzcre
    2-2 DKO_2   Bf-Mf-lyzcre
    2-3 DKO_3   Bf-Mf-lyzcre
    

    contrast.txt

    "Name"
    "Compared=WT-Bf-Mf0lyzcre“
    

    Indeed, I get some result, but I think that is not I really want. My DEAnalysis part is like this, what should I add would give me a file contain p-value and tell me if the gene have significant change.

    [DEAnalysis]
    ; Complete path of the target file.
    targetfile=targets.txt
    ; Path of the contrast file.
    contrastfile=contrast.txt
    #This value refers to filter processing in the reads (Should be "yes" or "no").
    filter=yes
    ;Specific software to perform the Differential Expression Analysis (Allowed values: edger, noiseq or edger-noiseq)
    desoft=EdgeR
    ;Provide a file with normalized reads
    cpm=yes
    ;Provide a file with RPKM values
    rpkm=yes
    ;Cutoff for the counts per million value to be used in filter processing with EdgeR
    cpmvalue=2
    ;Number of replicates that have to contains at least a defined number of reads per million to perform the filter with EdgeR software
    repthreshold=1
    ;Normalization method to perform the DE analysis with EdgeR.
    edger_normethod=TMM
    ;Providing replicates
    replicates=yes
    

    Thank you for your help!

  7. Eduardo Andres Leon

    Hi, The result you want (with pvalues and FDR) should be inside EdgeR_results folder, but first the DEAnalysis must end correctly, so I see two errors in your setup:

    First. R do not support the minus symbol in a variable name, so it interprets Bf-Mf-lyzcre as Bf minus Mf minus lyzcre. So you should rename this to Bf_Mf_lyzcre

    Second I see in your targets two different condition: WT Bf-Mf-lyzcre

    But in the contrast you use WT and Bf-Mf0lyzcre with is different from Bf-Mf-lyzcre

    May be your problem is related to this. Then as a suggestion if WT is your "control", the comparison should be "Compared=Bf_Mf_lyzcre-WT“

    Let me know

    Edu

  8. Kyra Yang reporter

    Hi Edu,

    Thank you for your help, I get my result! And thank you for making this brilliant program. It saves me so much time.

  9. Log in to comment