Unexpected running result using ROSE

Issue #26 new
Jun created an issue

I was trying to use ROSE to analyze my own H3K27ac data. instead of the expected plot like this ( I ran with example data) HG18_MM1S_MED1_Plot_points.png I got something like this with my own data H3K27ac_MBDCKONCprec_peaks_Plot_points.png I wondered what went wrong. I am not such an experienced bioinformatician, so I did almost every step in a "following the cookbook" manner, here is what I did:

align the reads to reference genome with bowtie

~/Programs/bowtie2-2.2.1/bowtie2-align-s --wrapper basic-0 -p 32 -x mm9 -S H3K27ac_.sam -U H3K27ac_.trim.fastq

output file

H3K27ac_.sam

convert sam file into bam file and sort it as RANKING_BAM

samtools view -bS H3K27ac_.sam > H3K27ac_.bam samtools sort H3K27ac_.bam -o H3K27ac_.sorted.bam

convert sam file into bam file and sort it as CONTROL_BAM:

samtools view -bS TFChIP_input.sam > TFChIP_input.bam samtools sort TFChIP_input.bam -o TFChIP_input.sort.bam

index bam file

samtools index H3K27ac_MBDCKONCprec.sorted.bam H3K27ac_MBDCKONCprec.sorted.bai & samtools index H3K27ac_MBDCKOprec.sorted.bam H3K27ac_MBDCKOprec.sorted.bai &

peak calling with macs14

macs14 -t H3K27ac_.bam -c TFChIP_input.bam -f BAM -g mm -n H3K27ac_–p 1e-9,–keep-dup = auto, -w –S –space = 50

output file peak calling

H3K27ac_peaks.bed

convert peak bed file into gff

awk '{OFS="\t"; print $1, $4, ".", $2, $3, ".",".",".", $4}' H3K27ac_peaks.bed > H3K27ac_peaks.gff

output used as INPUT_CONSTITUENT_GFF

H3K27ac_peaks.gff

python ROSE_main.py -g mm9 -i H3K27ac_peaks.gff -r H3K27ac_.sorted.bam -c TFChIP_input.sort.bam -o precursortry -s 12500 -t 2500

Does any one have any idea what might cause this issue? Thanks

Comments (19)

  1. Brian Abraham

    Hi Jun,

    There are several possible items that need checking before your problem can be diagnosed properly.

    1) Have you tried running without asterisks in the parameter list 2) Do all the output files described in the instructions exist?
    3) Do the files in precursotry/mappedGFF have values at all? 4) Is your GFF formatted properly? 5) Does your GFF use the same nomenclature for chromosomes as your bams? E.g. "chr1" vs. "1" 6) What is the output of the ROSE_main.py call?

  2. Jun reporter

    Hi Brian, Thank you for your prompt reply.

    Regarding to the item you mentioned:

    1) “Asterisks in the parameter”, if this “ * “ is what you mean, I think I should say “ yes”, I copied the full command which I had tried.

    2) Yes, all this files exist. But I sort of deleted some parts of names, H3K27ac_(some alphabets deleted here).sam when I pasted above.

    3) by “value”, if you mean “ *. sorted.bam”, no, there is no value. [myname@hostname mappedGFF]$ cat H3K27ac_peaksn2_12KB_STITCHED_TSS_DISTAL_H3K27ac_.sorted.bam_MAPPED.gff GENE_ID locusLine bin_1_H3K27ac_.sorted.bam 1_MACS_peak_1907_lociStitched chr11(.):5115560-5116248 0.0 1_MACS_peak_3903_lociStitched chr13(.):8830474-8831479 0.0

    [myname@hostname mappedGFF]$ cat H3K27ac_peaksn2_H3K27ac_.sorted.bam_MAPPED.gff GENE_ID locusLine bin_1_H3K27ac_.sorted.bam MACS_peak_1 chr1(.):4775237-4776463 0.0

    4) Actually that is the part which I was not sure about.

    initially I used cgat.bed2gff.py( https://github.com/CGATOxford/cgat/blob/master/scripts/bed2gff.py) then I found format was different from the example gff file. HG18_MM1S_MED1.gff

    5) I think the answer is “yes”. if I visualize .sorted.bam with BAMseek: X64AW:05222:05391 16 chr1 3000155 42 189M * 0 0 TTCTGTCATAGATTAATCTATTTTGCAGATGTAATTGTATGTATTATAATTGTAATAGTATATACTTGTATGTACTTAAAATATTTTATCATAGTTTTCAATGCCTAAGATTTTGTCTTTCTTCTCTTGTAATCTGTTGGTAATGCTTGCTTCTGTAGTTACTGTTTGCTTACCTAGATTCTTCTTTTC ?8899999==<=8<78557907776-..::75585=;<<:8893---5;6:<>8><;::;666==8<<<<:::::4908989:.::98978444.89863::;8<<7=<:1::::==8@=<9==<<8==>9==AA<9<7==8==<<8>><7==<888=9=<<<<59:>/0/7A==>;8;;9@A7>>?= AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:189 YT:Z:UU X64AW:08770:07737 16 chr1 3000155 42 32M1I132M1I25M * 0 0 TTCTGTCATAGATTAATCTATTTTGCAGATGTAAATTGTATGTATTATAATTGTAATAATATATACTTGTATGTACTTAAAATATTTTATCATAGTTTTCAATGCCTAAGATTTTGTCTTTCTTCTCTTGTAATCTGTTGGTAATGCTTGCTTCTGTAGTTACTGTTTTGCTTACCTAGATTCTTCTTTTC 677:8:99<===85255::8-87831325781(33-88784422,111.3/322.3-(--3--4986;:<<:9955/5+44555+9993334473&333118995:96:990999:=>7>=<8<<;5068<7=====8=8==9===<9=;<7?==885<5;;;:5)8899=7=<6><===9=939/(//// AS:i:-19 XN:i:0 XM:i:1 XO:i:2 XG:i:2 NM:i:3 MD:Z:57G131 YT:Z:UU X64AW:04013:13327 16 chr1 3000174 42 170M * 0 0 ATTTTGCAGATGTAATTGTATGTATTATAATTGTAATAGTATATACTTGTATGTACTTAAAATATTTTATCATAGTTTTCAATGCCTAAGATTTTGTCTTTCTTCTCTTGTAATCTGTTGGTAATGCTTGCTTCTGTAGTTACTGTTTGCTTACCTAGATTCTTCTTTTC 3'5553666::::7<6<:<<<77769774=6;;;6::445:<988:6<665;99883:/???9919855666999199:729861662776/8554733332<<<62633-:::<606033-344:544449:9:<::6;<<==8@<<9350497/0/)995;8/88:9 AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:170 YT:Z:UU

    6) You mean what if I call ROSE_main.py?

    myname@hostname rose]$ USING H3K27ac_MBDCKONCprec_peaksn2.gff AS THE INPUT GFF USING mm9 AS THE GENOME MAKING START DICT LOADING IN GFF REGIONS CHECKING INPUT TO MAKE SURE EACH REGION HAS A UNIQUE IDENTIFIER REFERENCE COLLECTION PASSES QC STITCHING REGIONS TOGETHER PERFORMING REGION STITCHING REMOVED 8566 LOCI BECAUSE THEY WERE CONTAINED BY A TSS REMOVED 2 STITCHED LOCI BECAUSE THEY OVERLAPPED MULTIPLE TSSs ADDED BACK 4 ORIGINAL LOCI MAKING GFF FROM STITCHED COLLECTION WRITING STITCHED GFF TO DISK AS precursortry/gff/H3K27ac_MBDCKONCprec_peaksn2_12KB_STITCHED_TSS_DISTAL.gff OUTPUT WILL BE WRITTEN TO precursortry/H3K27ac_MBDCKONCprec_peaksn2_12KB_STITCHED_TSS_DISTAL_ENHANCER_REGION_MAP.txt python ROSE_bamToGFF.py -f 1 -e 200 -r -m 1 -b H3K27ac_MBDCKONCprec.sorted.bam -i precursortry/gff/H3K27ac_MBDCKONCprec_peaksn2_12KB_STITCHED_TSS_DISTAL.gff -o precursortry/mappedGFF/H3K27ac_MBDCKONCprec_peaksn2_12KB_STITCHED_TSS_DISTAL_H3K27ac_MBDCKONCprec.sorted.bam_MAPPED.gff & python ROSE_bamToGFF.py -f 1 -e 200 -r -m 1 -b H3K27ac_MBDCKONCprec.sorted.bam -i H3K27ac_MBDCKONCprec_peaksn2.gff -o precursortry/mappedGFF/H3K27ac_MBDCKONCprec_peaksn2_H3K27ac_MBDCKONCprec.sorted.bam_MAPPED.gff & python ROSE_bamToGFF.py -f 1 -e 200 -r -m 1 -b TFChIP_tCD25input.sort.bam -i precursortry/gff/H3K27ac_MBDCKONCprec_peaksn2_12KB_STITCHED_TSS_DISTAL.gff -o precursortry/mappedGFF/H3K27ac_MBDCKONCprec_peaksn2_12KB_STITCHED_TSS_DISTAL_TFChIP_tCD25input.sort.bam_MAPPED.gff & python ROSE_bamToGFF.py -f 1 -e 200 -r -m 1 -b TFChIP_tCD25input.sort.bam -i H3K27ac_MBDCKONCprec_peaksn2.gff -o precursortry/mappedGFF/H3K27ac_MBDCKONCprec_peaksn2_TFChIP_tCD25input.sort.bam_MAPPED.gff & PAUSING TO MAP {'matrix': '1', 'extension': '200', 'floor': '1', 'sense': 'both', 'output': 'precursortry/mappedGFF/H3K27ac_MBDCKONCprec_peaksn2_12KB_STITCHED_TSS_DISTAL_H3K27ac_MBDCKONCprec.sorted.bam_MAPPED.gff', 'bam': 'H3K27ac_MBDCKONCprec.sorted.bam', 'rpm': True, 'input': 'precursortry/gff/H3K27ac_MBDCKONCprec_peaksn2_12KB_STITCHED_TSS_DISTAL.gff'} [] mapping to GFF and making a matrix with fixed bin number {'matrix': '1', 'extension': '200', 'floor': '1', 'sense': 'both', 'output': 'precursortry/mappedGFF/H3K27ac_MBDCKONCprec_peaksn2_H3K27ac_MBDCKONCprec.sorted.bam_MAPPED.gff', 'bam': 'H3K27ac_MBDCKONCprec.sorted.bam', 'rpm': True, 'input': 'H3K27ac_MBDCKONCprec_peaksn2.gff'} [] mapping to GFF and making a matrix with fixed bin number {'matrix': '1', 'extension': '200', 'floor': '1', 'sense': 'both', 'output': 'precursortry/mappedGFF/H3K27ac_MBDCKONCprec_peaksn2_12KB_STITCHED_TSS_DISTAL_TFChIP_tCD25input.sort.bam_MAPPED.gff', 'bam': 'TFChIP_tCD25input.sort.bam', 'rpm': True, 'input': 'precursortry/gff/H3K27ac_MBDCKONCprec_peaksn2_12KB_STITCHED_TSS_DISTAL.gff'} [] mapping to GFF and making a matrix with fixed bin number {'matrix': '1', 'extension': '200', 'floor': '1', 'sense': 'both', 'output': 'precursortry/mappedGFF/H3K27ac_MBDCKONCprec_peaksn2_TFChIP_tCD25input.sort.bam_MAPPED.gff', 'bam': 'TFChIP_tCD25input.sort.bam', 'rpm': True, 'input': 'H3K27ac_MBDCKONCprec_peaksn2.gff'} [] mapping to GFF and making a matrix with fixed bin number WAITING FOR MAPPING TO COMPLETE. ELAPSED TIME (MIN): 0 MAPPING TOOK 5 MINUTES BAM MAPPING COMPLETED NOW MAPPING DATA TO REGIONS FORMATTING TABLE 1000 2000 3000 4000 5000 GETTING MAPPED DATA GETTING MAPPING DATA FOR H3K27ac_MBDCKONCprec.sorted.bam OPENING precursortry/mappedGFF/H3K27ac_MBDCKONCprec_peaksn2_12KB_STITCHED_TSS_DISTAL_H3K27ac_MBDCKONCprec.sorted.bam_MAPPED.gff MAKING SIGNAL DICT FOR H3K27ac_MBDCKONCprec.sorted.bam GETTING MAPPING DATA FOR TFChIP_tCD25input.sort.bam OPENING precursortry/mappedGFF/H3K27ac_MBDCKONCprec_peaksn2_12KB_STITCHED_TSS_DISTAL_TFChIP_tCD25input.sort.bam_MAPPED.gff MAKING SIGNAL DICT FOR TFChIP_tCD25input.sort.bam CALLING AND PLOTTING SUPER-ENHANCERS R --no-save precursortry/ precursortry/H3K27ac_MBDCKONCprec_peaksn2_12KB_STITCHED_TSS_DISTAL_ENHANCER_REGION_MAP.txt H3K27ac_MBDCKONCprec_peaksn2 TFChIP_tCD25input.sort.bam < ROSE_callSuper.R ?? 'precursortry/' ????????

    ?? 'precursortry/H3K27ac_MBDCKONCprec_peaksn2_12KB_STITCHED_TSS_DISTAL_ENHANCER_REGION_MAP.txt' ????????

    ?? 'H3K27ac_MBDCKONCprec_peaksn2' ????????

    ?? 'TFChIP_tCD25input.sort.bam' ????????

    R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" Copyright (C) 2015 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit)

    R は、自由なソフトウェアであり、「完全に無保証」です。 一定の条件に従えば、自由にこれを再配布することができます。 配布条件の詳細に関しては、'license()' あるいは 'licence()' と入力してください。

    R は多くの貢献者による共同プロジェクトです。 詳しくは 'contributors()' と入力してください。 また、R や R のパッケージを出版物で引用する際の形式については 'citation()' と入力してください。

    'demo()' と入力すればデモをみることができます。 'help()' とすればオンラインヘルプが出ます。 'help.start()' で HTML ブラウザによるヘルプがみられます。 'q()' と入力すれば R を終了します。

    #============================================================================ #==============SUPER-ENHANCER CALLING AND PLOTTING FUNCTIONS================= #============================================================================

    #This function calculates the cutoff by sliding a diagonal line and finding where it is tangential (or as close as possible) calculate_cutoff <- function(inputVector, drawPlot=TRUE,...){ + inputVector <- sort(inputVector) + inputVector[inputVector<0]<-0 #set those regions with more control than ranking equal to zero + slope <- (max(inputVector)-min(inputVector))/length(inputVector) #This is the slope of the line we want to slide. This is the diagonal. + xPt <- floor(optimize(numPts_below_line,lower=1,upper=length(inputVector),myVector= inputVector,slope=slope)$minimum) #Find the x-axis point where a line passing through that point has the minimum number of points below it. (ie. tangent) + y_cutoff <- inputVector[xPt] #The y-value at this x point. This is our cutoff. +
    + if(drawPlot){ #if TRUE, draw the plot + plot(1:length(inputVector), inputVector,type="l",...) + b <- y_cutoff-(slope* xPt) + abline(v= xPt,h= y_cutoff,lty=2,col=8) + points(xPt,y_cutoff,pch=16,cex=0.9,col=2) + abline(coef=c(b,slope),col=2) + title(paste("x=",xPt,"\ny=",signif(y_cutoff,3),"\nFold over Median=",signif(y_cutoff/median(inputVector),3),"x\nFold over Mean=",signif(y_cutoff/mean(inputVector),3),"x",sep="")) + axis(1,sum(inputVector==0),sum(inputVector==0),col.axis="pink",col="pink") #Number of regions with zero signal + } + return(list(absolute=y_cutoff,overMedian=y_cutoff/median(inputVector),overMean=y_cutoff/mean(inputVector))) + }

    #this is an accessory function, that determines the number of points below a diagnoal passing through [x,yPt] numPts_below_line <- function(myVector,slope,x){ + yPt <- myVector[x] + b <- yPt-(slopex) + xPts <- 1:length(myVector) + return(sum(myVector<=(xPtsslope+b))) + }

    convert_stitched_to_bed <- function(inputStitched,trackName,trackDescription,outputFile,splitSuper=TRUE,score=c(),superRows=c(),baseColor="0,0,0",superColor="255,0,0"){ + outMatrix <- matrix(data="",ncol=4+ifelse(length(score)==nrow(inputStitched),1,0),nrow=nrow(inputStitched)) +
    + outMatrix[,1] <- as.character(inputStitched$CHROM) + outMatrix[,2] <- as.character(inputStitched$START) + outMatrix[,3] <- as.character(inputStitched$STOP) + outMatrix[,4] <- as.character(inputStitched$REGION_ID) + if(length(score)==nrow(inputStitched)){ + score <- rank(score,ties.method="first") + score <- length(score)-score+1 #Stupid rank only does smallest to largest. + outMatrix[,5] <- as.character(score) + } + trackDescription <- paste(trackDescription,"\nCreated on ",format(Sys.time(), "%b %d %Y"),collapse="",sep="") + trackDescription <- gsub("\n","\t", trackDescription) + tName <- gsub(" ","",trackName) + cat('track name="', tName,'" description="', trackDescription,'" itemRGB=On color=',baseColor,"\n",sep="",file=outputFile) + write.table(file= outputFile,outMatrix,sep="\t",quote=FALSE,row.names=FALSE,col.names=FALSE,append=TRUE) + if(splitSuper==TRUE){ + cat("\ntrack name=\"Super", tName,'" description="Super ', trackDescription,'" itemRGB=On color=', superColor,"\n",sep="",file=outputFile,append=TRUE) + write.table(file= outputFile,outMatrix[superRows,],sep="\t",quote=FALSE,row.names=FALSE,col.names=FALSE,append=TRUE) + } + }

    convert_stitched_to_gateway_bed <- function(inputStitched,outputFileRoot,splitSuper=TRUE,score=c(),superRows=c()){ + outMatrix <- matrix(data="",ncol=6,nrow=nrow(inputStitched)) +
    + outMatrix[,1] <- as.character(inputStitched$CHROM) + outMatrix[,2] <- as.character(inputStitched$START) + outMatrix[,3] <- as.character(inputStitched$STOP) + outMatrix[,4] <- as.character(inputStitched$REGION_ID) +
    + if(length(score)==nrow(inputStitched)){ + score <- rank(score,ties.method="first") + score <- length(score)-score+1 #Stupid rank only does smallest to largest. + outMatrix[,5] <- as.character(score) + } +
    + outMatrix[,6] <- as.character(rep('.',nrow(outMatrix))) +
    +
    + outputFile1 = paste(outputFileRoot,'_Gateway_Enhancers.bed',sep='') + write.table(file= outputFile1,outMatrix,sep="\t",quote=FALSE,row.names=FALSE,col.names=FALSE,append=TRUE) + if(splitSuper==TRUE){ + outputFile2 = paste(outputFileRoot,'_Gateway_SuperEnhancers.bed',sep='') + + write.table(file= outputFile2,outMatrix[superRows,],sep="\t",quote=FALSE,row.names=FALSE,col.names=FALSE,append=TRUE) + } + }

    writeSuperEnhancer_table <- function(superEnhancer,description,outputFile,additionalData=NA){ + description <- paste("#",description,"\nCreated on ",format(Sys.time(), "%b %d %Y"),collapse="",sep="") + description <- gsub("\n","\n#",description) + cat(description,"\n",file=outputFile) + if(is.matrix(additionalData)){ + if(nrow(additionalData)!=nrow(superEnhancer)){ + warning("Additional data does not have the same number of rows as the number of super enhancers.\n--->>> ADDITIONAL DATA NOT INCLUDED <<<---\n") + }else{ + superEnhancer <- cbind(superEnhancer,additionalData) + superEnhancer = superEnhancer[order(superEnhancer$enhancerRank),] +
    + } + } + write.table(file=outputFile,superEnhancer,sep="\t",quote=FALSE,row.names=FALSE,append=TRUE) + }

    #============================================================================ #===================SUPER-ENHANCER CALLING AND PLOTTING====================== #============================================================================

    args <- commandArgs()

    print('THESE ARE THE ARGUMENTS') [1] "THESE ARE THE ARGUMENTS" print(args) [1] "/user/myinstitute/myname/Programme/lib64/R/bin/exec/R"
    [2] "--no-save"
    [3] "precursortry/"
    [4] "precursortry/H3K27ac_MBDCKONCprec_peaksn2_12KB_STITCHED_TSS_DISTAL_ENHANCER_REGION_MAP.txt" [5] "H3K27ac_MBDCKONCprec_peaksn2"
    [6] "TFChIP_tCD25input.sort.bam"

    #ARGS outFolder = args[3] enhancerFile = args[4] enhancerName = args[5] wceName = args[6]

    #Read enhancer regions with closestGene columns stitched_regions <- read.delim(file= enhancerFile,sep="\t")

    #perform WCE subtraction. Using pipeline table to match samples to proper background. rankBy_factor = colnames(stitched_regions)[7]

    prefix = unlist(strsplit(rankBy_factor,'_'))[1]

    if(wceName == 'NONE'){ +
    +
    + rankBy_vector = as.numeric(stitched_regions[,7]) +
    + }else{ + wceName = colnames(stitched_regions)[8] + print('HERE IS THE WCE NAME') + print(wceName) +
    + rankBy_vector = as.numeric(stitched_regions[,7])-as.numeric(stitched_regions[,8]) + } [1] "HERE IS THE WCE NAME" [1] "TFChIP_tCD25input.sort.bam"

    #SETTING NEGATIVE VALUES IN THE rankBy_vector to 0

    rankBy_vector[rankBy_vector < 0] <- 0

    #FIGURING OUT THE CUTOFF

    cutoff_options <- calculate_cutoff(rankBy_vector, drawPlot=FALSE,xlab=paste(rankBy_factor,'_enhancers'),ylab=paste(rankByFactor,' Signal','- ',wceName),lwd=2,col=4)

    #These are the super-enhancers superEnhancerRows <- which(rankBy_vector> cutoff_options$absolute) typicalEnhancers = setdiff(1:nrow(stitched_regions),superEnhancerRows) enhancerDescription <- paste(enhancerName," Enhancers\nCreated from ", enhancerFile,"\nRanked by ",rankBy_factor,"\nUsing cutoff of ",cutoff_options$absolute," for Super-Enhancers",sep="",collapse="")

    #MAKING HOCKEY STICK PLOT plotFileName = paste(outFolder,enhancerName,'_Plot_points.png',sep='') png(filename=plotFileName,height=600,width=600) signalOrder = order(rankBy_vector,decreasing=TRUE) if(wceName == 'NONE'){ + plot(length(rankBy_vector):1,rankBy_vector[signalOrder], col='red',xlab=paste(rankBy_factor,'_enhancers'),ylab=paste(rankBy_factor,' Signal'),pch=19,cex=2) +
    + }else{ + plot(length(rankBy_vector):1,rankBy_vector[signalOrder], col='red',xlab=paste(rankBy_factor,'_enhancers'),ylab=paste(rankBy_factor,' Signal','- ',wceName),pch=19,cex=2) + } abline(h=cutoff_options$absolute,col='grey',lty=2) abline(v=length(rankBy_vector)-length(superEnhancerRows),col='grey',lty=2) lines(length(rankBy_vector):1,rankBy_vector[signalOrder],lwd=4, col='red') text(0,0.8*max(rankBy_vector),paste(' Cutoff used: ',cutoff_options$absolute,'\n','Super-Enhancers identified: ',length(superEnhancerRows)),pos=4)

    dev.off() null device 1

    #Writing a bed file bedFileName = paste(outFolder,enhancerName,'_Enhancers_withSuper.bed',sep='') convert_stitched_to_bed(stitched_regions,paste(rankBy_factor,"Enhancers"), enhancerDescription,bedFileName,score=rankBy_vector,splitSuper=TRUE,superRows= superEnhancerRows,baseColor="0,0,0",superColor="255,0,0")

    bedFileRoot = paste(outFolder,enhancerName,sep='')

    convert_stitched_to_gateway_bed(stitched_regions,bedFileRoot,splitSuper=TRUE,score=rankBy_vector,superRows = superEnhancerRows)

    #This matrix is just the super_enhancers true_super_enhancers <- stitched_regions[superEnhancerRows,]

    additionalTableData <- matrix(data=NA,ncol=2,nrow=nrow(stitched_regions)) colnames(additionalTableData) <- c("enhancerRank","isSuper") additionalTableData[,1] <- nrow(stitched_regions)-rank(rankBy_vector,ties.method="first")+1 additionalTableData[,2] <- 0 additionalTableData[superEnhancerRows,2] <- 1

    #Writing enhancer and super-enhancer tables with enhancers ranked and super status annotated enhancerTableFile = paste(outFolder,enhancerName,'_AllEnhancers.table.txt',sep='') writeSuperEnhancer_table(stitched_regions, enhancerDescription,enhancerTableFile, additionalData= additionalTableData) 警告メッセージ: write.table(file = outputFile, superEnhancer, sep = "\t", quote = FALSE, で: appending column names to file

    superTableFile = paste(outFolder,enhancerName,'_SuperEnhancers.table.txt',sep='') writeSuperEnhancer_table(true_super_enhancers, enhancerDescription,superTableFile, additionalData= additionalTableData[superEnhancerRows,]) 警告メッセージ: write.table(file = outputFile, superEnhancer, sep = "\t", quote = FALSE, で: appending column names to file

    using a MMR value of 13.2973 has chr Number lines processed 0 using a MMR value of 13.2973 has chr Number lines processed 0 100 100 200 200 300 300 400 400 500 500 600 600 700 700 800 800 900 900 1000 1000 1100 1100 1200 1200 1300 1300 1400 1400 1500 1600 1500 1700 1600 1800 1700 1900 1800 2000 1900 2100 2000 2200 2100 2300 2200 2400 2300 2500 2 ...

    if further information is needed for diagnosis, please kindly let me know. Thanks! Best, Jun

  3. Jun reporter

    Hi Brian, I tried applying ROSE to other H3K27ac input data , and I got the "iconic" super enhancer curve, these data were dealt with the same pipeline I used for the data which I had the issue mentioned above. So I think there is nothing wrong with the ROSE and my pipeline, something is wrong with my previous data, before I figured out what went wrong with the data mentioned above, I will leave the "issue" here. Wish that would be fine. Thanks and sorry!

  4. Bharath

    When I tried running the script, I too got the same sort of output. Should you try givinf sorted file as input?

  5. Jun reporter

    @Brian Abraham. Sorry I did not include the indexing information in my previous issue report. I added that. I guess that is not the cause of the issue I met. I did index before I ran the ROSE algorithm and I could see the generated bai file. and if there is no bai file or the bam is unsorted, when I tried to run ROSE, I will have complain like " the bam file should be sorted and indexed"

  6. yan kai

    Hi Jun,

    I also encountered the same problem. I used H3K27ac peaks (in GFF format) and sorted and indexed H3K27ac bam file, but I got the same curve. Have you figured out what's happening there? Any feed would be appreciated.

  7. Bharath

    GFF format must match the format given in the example data. 1. Please have a look into the Younglab website and download the sample data and try to match the GFF format that you have with that of the example data. 2. Have both the indexed and sorted BAM file inside the same folder. When I got the same error, I checked these and I made a mistake in keeping sorted and indexed file in different folders. Please post if you still get an error!

  8. yan kai

    Hi Bharath,

    Thanks for the prompt reply! I just checked those: my GFF was in the same format as the example data, also the index file and bam file are in the same folder.

  9. Jun reporter

    @yan kai. Pay attention that the example data has three columns with just "space", which is not so obvious. if you use awk to convert your bed to gff, do not leave them out, like: awk '{OFS="\t"; print $1, $4, "", $2, $3, "",".","", $4}' yourdata_peaks.bed > yourdata_peaksn2.gff if you are pretty sure that format of your gff is allright, then check whether your input has any issue or not ( try to apply Rose without control; try to prepare input gff files from other data). these were issues I had. Wish that would help.

  10. dude_law

    Hi, I encounter this before, one of the reason is that the names of bam files for treatment and control cannot be the same. I think it is because R will rename the column if the names are same, then it cannot call the column by using its original name.

    Good luck.

  11. Deniz Törüner

    Hi! I have the same error too… It says 0 super-enhancers identified for both my own data and the example data. Any feed would be appreciated.

  12. Brian Abraham

    Hi Deniz,

    Aside from the issue highlighted above where the treatment and control BAMs cannot be the same, I suspect that your installation of ROSE is not calculating coverage in stitched enhancers properly. One way to verify this is to see whether the output in mappedGFF/* has non-zero values. If they are all zero, I suggest checking that samtools is installed in the way ROSE expects. ROSE_bamToGFF needs to be able to call “samtools” at the command line as currently shipped.

  13. Kelby Kies

    Hello Brian, I have ran the ROSE pipeline and everything appears to be fine. I have appeared to have identified Super Enhancers, but when I look at the files in /mappedGFF/, there might be an issue.

    The *_MAPPED.gff file looks like this:

    GENE_IDlocusLinebin_1_adj_D6_merged_sorted_sorted.bam.bam

    e9.5_6_peak_1chr1(.):3672031-36720320.0

    e9.5_6_peak_2chr1(.):4456315-44563160.0

    e9.5_6_peak_3chr1(.):4471111-44711120.0

    e9.5_6_peak_4chr1(.):4488696-44886970.0

    e9.5_6_peak_5chr1(.):4490001-44900020.0

    e9.5_6_peak_6chr1(.):4490713-44907140.0

    e9.5_6_peak_7chr1(.):4491981-44919820.0

    e9.5_6_peak_8chr1(.):4493344-44933450.0

    e9.5_6_peak_9chr1(.):4494935-44949360.0

    I have tried using samtools version 1.9 and 1.8 and there doesn’t appear to be a change.

    Is there anything else I can try to get this fixed?

  14. Log in to comment