A problem with ROSE execution (using the example data set provided with ROSE)

Issue #4 resolved
Former user created an issue

Dear ‘ROSE’ developers,

Thank you very much for developing ‘ROSE’! It is of a great interest for us to be able and use your software for discovering super enhancer within our system and we would be very appreciable if you could advise us what could be amend in our procedure in order to perform a successful ‘ROSE’ run.

We have recently downloaded the current ROSE package from you website: https://bitbucket.org/young_computation/rose/downloads (upon downloading we have got the file: young_computation-rose-15819b1ee94a.zip, which we then extracted). We also downloaded the example library that goes along with ‘ROSE’, which you deposited under ‘example data for ROSE’: http://younglab.wi.mit.edu/super_enhancer_code.html

We have made several attempts to execute ‘ROSE’ using the command line that was provided within the example.sh file (a file contained within the ROSE_DATA folder). This is the command line we used: python ROSE_main.py -g HG18 -i ./data/HG18_MM1S_MED1.gff -r ./data/MM1S_MED1.hg18.bwt.sorted.bam -c ./data/MM1S_WCE.hg18.bwt.sorted.bam -o example -s 12500 -t 2500

It looks like ‘ROSE’ starts to run nicely, it actually creates the ‘example’ output folder, which includes two subfolder: mappedGFF, and gff. The gff folder is also become populated by two output files: HG18_MM1S_MED1_12KB_STITCHED_TSS_DISTAL.gff and HG18_MM1S_MED1.gff - but then, according to the output summary appears on the screen it looks like the program accumulates 4 error messages: "TypeError: float() argument must be a string or a number" the program continues until it reaches to the lines: WAITING FOR MAPPING TO COMPLETE. ELAPSED TIME (MIN): 0 At this point the program seems to get stuck for long minutes without any change.

By now we tried executing ROSE on Linux, Cygwin, and Mac (on Terminal) - but in all cases my output contained the 4 error messages: "TypeError: float() argument must be a string or a number"

Please advice what should we do in order to be able and execute a ROSE run using the example files provided with the 'example data for Rose’. Does the fact that we get the message: "TypeError: float() argument must be a string or a number” Is important at all?.. Or should we ignore it? Why does the program seems to be stuck after reaching the line: WAITING FOR MAPPING TO COMPLETE. ELAPSED TIME (MIN) ? What is the expected time frame in which ‘ROSE’ should complete analysis of the 'example data’? On our Unix server the program got stuck for more than 60 minutes without a change.

Thanks a lot in advance!

Roy

Here are the output lines that we get after executing the command line that within the example.sh file which you provided:

Macintosh-5:ROSE royblum$ python ROSE_main.py -g HG18 -i ./data/HG18_MM1S_MED1.gff -r ./data/MM1S_MED1.hg18.bwt.sorted.bam -c ./data/MM1S_WCE.hg18.bwt.sorted.bam -o example -s 12500 -t 2500 folder example/ does not exist folder example/gff/ does not exist folder example/mappedGFF/ does not exist USING ./data/HG18_MM1S_MED1.gff AS THE INPUT GFF USING HG18 AS THE GENOME MAKING START DICT LOADING IN GFF REGIONS STITCHING REGIONS TOGETHER PERFORMING REGION STITCHING REMOVED 7865 LOCI BECAUSE THEY WERE CONTAINED BY A TSS REMOVED 16 STITCHED LOCI BECAUSE THEY OVERLAPPED MULTIPLE TSSs ADDED BACK 42 ORIGINAL LOCI MAKING GFF FROM STITCHED COLLECTION WRITING STITCHED GFF TO DISK AS example/gff/HG18_MM1S_MED1_12KB_STITCHED_TSS_DISTAL.gff OUTPUT WILL BE WRITTEN TO example/HG18_MM1S_MED1_12KB_STITCHED_TSS_DISTAL_ENHANCER_REGION_MAP.txt python ROSE_bamToGFF.py -f 1 -e 200 -r -m 1 -b ./data/MM1S_MED1.hg18.bwt.sorted.bam -i example/gff/HG18_MM1S_MED1_12KB_STITCHED_TSS_DISTAL.gff -o example/mappedGFF/HG18_MM1S_MED1_12KB_STITCHED_TSS_DISTAL_MM1S_MED1.hg18.bwt.sorted.bam_MAPPED.gff & python ROSE_bamToGFF.py -f 1 -e 200 -r -m 1 -b ./data/MM1S_MED1.hg18.bwt.sorted.bam -i ./data/HG18_MM1S_MED1.gff -o example/mappedGFF/HG18_MM1S_MED1_MM1S_MED1.hg18.bwt.sorted.bam_MAPPED.gff & python ROSE_bamToGFF.py -f 1 -e 200 -r -m 1 -b ./data/MM1S_WCE.hg18.bwt.sorted.bam -i example/gff/HG18_MM1S_MED1_12KB_STITCHED_TSS_DISTAL.gff -o example/mappedGFF/HG18_MM1S_MED1_12KB_STITCHED_TSS_DISTAL_MM1S_WCE.hg18.bwt.sorted.bam_MAPPED.gff & python ROSE_bamToGFF.py -f 1 -e 200 -r -m 1 -b ./data/MM1S_WCE.hg18.bwt.sorted.bam -i ./data/HG18_MM1S_MED1.gff -o example/mappedGFF/HG18_MM1S_MED1_MM1S_WCE.hg18.bwt.sorted.bam_MAPPED.gff & PAUSING TO MAP {'matrix': '1', 'extension': '200', 'floor': '1', 'sense': 'both', 'output': 'example/mappedGFF/HG18_MM1S_MED1_MM1S_WCE.hg18.bwt.sorted.bam_MAPPED.gff', 'bam': './data/MM1S_WCE.hg18.bwt.sorted.bam', 'rpm': True, 'input': './data/HG18_MM1S_MED1.gff'} {'matrix': '1', 'extension': '200', 'floor': '1', 'sense': 'both', 'output': 'example/mappedGFF/HG18_MM1S_MED1_MM1S_MED1.hg18.bwt.sorted.bam_MAPPED.gff', 'bam': './data/MM1S_MED1.hg18.bwt.sorted.bam', 'rpm': True, 'input': './data/HG18_MM1S_MED1.gff'} [] {'matrix': '1', 'extension': '200', 'floor': '1', 'sense': 'both', 'output': 'example/mappedGFF/HG18_MM1S_MED1_12KB_STITCHED_TSS_DISTAL_MM1S_MED1.hg18.bwt.sorted.bam_MAPPED.gff', 'bam': './data/MM1S_MED1.hg18.bwt.sorted.bam', 'rpm': True, 'input': 'example/gff/HG18_MM1S_MED1_12KB_STITCHED_TSS_DISTAL.gff'} [] [] mapping to GFF and making a matrix with fixed bin number mapping to GFF and making a matrix with fixed bin number mapping to GFF and making a matrix with fixed bin number {'matrix': '1', 'extension': '200', 'floor': '1', 'sense': 'both', 'output': 'example/mappedGFF/HG18_MM1S_MED1_12KB_STITCHED_TSS_DISTAL_MM1S_WCE.hg18.bwt.sorted.bam_MAPPED.gff', 'bam': './data/MM1S_WCE.hg18.bwt.sorted.bam', 'rpm': True, 'input': 'example/gff/HG18_MM1S_MED1_12KB_STITCHED_TSS_DISTAL.gff'} [] mapping to GFF and making a matrix with fixed bin number Traceback (most recent call last): File "ROSE_bamToGFF.py", line 247, in <module> Traceback (most recent call last): main() File "ROSE_bamToGFF.py", line 247, in <module> File "ROSE_bamToGFF.py", line 238, in main newGFF = mapBamToGFF(bamFile,gffFile,options.sense,int(options.extension),options.floor,options.rpm,options.matrix) File "ROSE_bamToGFF.py", line 40, in mapBamToGFF MMR= round(float(bam.getTotalReads('mapped'))/1000000,4) TypeError: float() argument must be a string or a number main() File "ROSE_bamToGFF.py", line 238, in main newGFF = mapBamToGFF(bamFile,gffFile,options.sense,int(options.extension),options.floor,options.rpm,options.matrix) File "ROSE_bamToGFF.py", line 40, in mapBamToGFF MMR= round(float(bam.getTotalReads('mapped'))/1000000,4) TypeError: float() argument must be a string or a number Traceback (most recent call last): File "ROSE_bamToGFF.py", line 247, in <module> main() File "ROSE_bamToGFF.py", line 238, in main newGFF = mapBamToGFF(bamFile,gffFile,options.sense,int(options.extension),options.floor,options.rpm,options.matrix) File "ROSE_bamToGFF.py", line 40, in mapBamToGFF MMR= round(float(bam.getTotalReads('mapped'))/1000000,4) TypeError: float() argument must be a string or a number Traceback (most recent call last): File "ROSE_bamToGFF.py", line 247, in <module> main() File "ROSE_bamToGFF.py", line 238, in main newGFF = mapBamToGFF(bamFile,gffFile,options.sense,int(options.extension),options.floor,options.rpm,options.matrix) File "ROSE_bamToGFF.py", line 40, in mapBamToGFF MMR= round(float(bam.getTotalReads('mapped'))/1000000,4) TypeError: float() argument must be a string or a number WAITING FOR MAPPING TO COMPLETE. ELAPSED TIME (MIN): 0

When we let the program wait it is just accumulating more minutes – the lower part of the output becomes to look like this:

WAITING FOR MAPPING TO COMPLETE. ELAPSED TIME (MIN): 0 30 60 90

Comments (3)

  1. Roy Blum

    Dear ‘Rose’ developers,

    I would like to inform that I solved my problem after reading a former reply (from ‘chaplain’) that appears at your blog: https://bitbucket.org/young_computation/rose/issue/2/error-list-index-out-of-range

    I basically add the samtools-0.1.19 to my Terminal PATH and now the ROSE execution seems to work fine. The program's output includes now all the output files, including the .png R-based figure file!

    However, something strange still happening and I hope that you may be able to advise me what could be the reason and how could I possibly solve it:

    Here is the output from my ROSE execution - As you’ll see, the program runs and as some point seems to complete the analysis,as it reaches to a point where it releases the command line back to the user. Nevertheless, a few second after reaching to that point, it looks like some counting is continuing (!!)… This without any further command given by me. I will bold the post-analysis section here below so you’ll be able to identify it more easily. This counting basically seems to continue endlessly and I am wondering why… It is even continuing further after I re-press Control-C, to get again the command line back to the user. I get the command line for maybe two seconds but then the counting just continues!…

    Would you be able to provide an explanation please? I would like to be sure that ROSE is running well on my computer.

    Thanks a lot in advance!

    Roy

    Here is the output:

    Macintosh-5:ROSE royblum$ python ROSE_main.py -g HG18 -i HG18_MM1S_MED1_1000.gff -r MM1S_MED1.hg18.bwt.sorted.bam -c MM1S_WCE.hg18.bwt.sorted.bam -o example2 -s 12500 -t 2500 USING HG18_MM1S_MED1_1000.gff AS THE INPUT GFF USING HG18 AS THE GENOME MAKING START DICT LOADING IN GFF REGIONS STITCHING REGIONS TOGETHER PERFORMING REGION STITCHING REMOVED 405 LOCI BECAUSE THEY WERE CONTAINED BY A TSS REMOVED 1 STITCHED LOCI BECAUSE THEY OVERLAPPED MULTIPLE TSSs ADDED BACK 1 ORIGINAL LOCI MAKING GFF FROM STITCHED COLLECTION WRITING STITCHED GFF TO DISK AS example2/gff/HG18_MM1S_MED1_1000_12KB_STITCHED_TSS_DISTAL.gff OUTPUT WILL BE WRITTEN TO example2/HG18_MM1S_MED1_1000_12KB_STITCHED_TSS_DISTAL_ENHANCER_REGION_MAP.txt python ROSE_bamToGFF.py -f 1 -e 200 -r -m 1 -b MM1S_MED1.hg18.bwt.sorted.bam -i example2/gff/HG18_MM1S_MED1_1000_12KB_STITCHED_TSS_DISTAL.gff -o example2/mappedGFF/HG18_MM1S_MED1_1000_12KB_STITCHED_TSS_DISTAL_MM1S_MED1.hg18.bwt.sorted.bam_MAPPED.gff & python ROSE_bamToGFF.py -f 1 -e 200 -r -m 1 -b MM1S_MED1.hg18.bwt.sorted.bam -i HG18_MM1S_MED1_1000.gff -o example2/mappedGFF/HG18_MM1S_MED1_1000_MM1S_MED1.hg18.bwt.sorted.bam_MAPPED.gff & python ROSE_bamToGFF.py -f 1 -e 200 -r -m 1 -b MM1S_WCE.hg18.bwt.sorted.bam -i example2/gff/HG18_MM1S_MED1_1000_12KB_STITCHED_TSS_DISTAL.gff -o example2/mappedGFF/HG18_MM1S_MED1_1000_12KB_STITCHED_TSS_DISTAL_MM1S_WCE.hg18.bwt.sorted.bam_MAPPED.gff & python ROSE_bamToGFF.py -f 1 -e 200 -r -m 1 -b MM1S_WCE.hg18.bwt.sorted.bam -i HG18_MM1S_MED1_1000.gff -o example2/mappedGFF/HG18_MM1S_MED1_1000_MM1S_WCE.hg18.bwt.sorted.bam_MAPPED.gff & PAUSING TO MAP {'matrix': '1', 'extension': '200', 'floor': '1', 'sense': 'both', 'output': 'example2/mappedGFF/HG18_MM1S_MED1_1000_12KB_STITCHED_TSS_DISTAL_MM1S_WCE.hg18.bwt.sorted.bam_MAPPED.gff', 'bam': 'MM1S_WCE.hg18.bwt.sorted.bam', 'rpm': True, 'input': 'example2/gff/HG18_MM1S_MED1_1000_12KB_STITCHED_TSS_DISTAL.gff'} [] mapping to GFF and making a matrix with fixed bin number {'matrix': '1', 'extension': '200', 'floor': '1', 'sense': 'both', 'output': 'example2/mappedGFF/HG18_MM1S_MED1_1000_MM1S_MED1.hg18.bwt.sorted.bam_MAPPED.gff', 'bam': 'MM1S_MED1.hg18.bwt.sorted.bam', 'rpm': True, 'input': 'HG18_MM1S_MED1_1000.gff'} [] mapping to GFF and making a matrix with fixed bin number {'matrix': '1', 'extension': '200', 'floor': '1', 'sense': 'both', 'output': 'example2/mappedGFF/HG18_MM1S_MED1_1000_MM1S_WCE.hg18.bwt.sorted.bam_MAPPED.gff', 'bam': 'MM1S_WCE.hg18.bwt.sorted.bam', 'rpm': True, 'input': 'HG18_MM1S_MED1_1000.gff'} [] mapping to GFF and making a matrix with fixed bin number {'matrix': '1', 'extension': '200', 'floor': '1', 'sense': 'both', 'output': 'example2/mappedGFF/HG18_MM1S_MED1_1000_12KB_STITCHED_TSS_DISTAL_MM1S_MED1.hg18.bwt.sorted.bam_MAPPED.gff', 'bam': 'MM1S_MED1.hg18.bwt.sorted.bam', 'rpm': True, 'input': 'example2/gff/HG18_MM1S_MED1_1000_12KB_STITCHED_TSS_DISTAL.gff'} [] mapping to GFF and making a matrix with fixed bin number WAITING FOR MAPPING TO COMPLETE. ELAPSED TIME (MIN): 0 using a MMR value of 17.4141 using a MMR value of 17.4141 using a MMR value of 20.5167 using a MMR value of 20.5167 using a MMR value of 20.5167 using a MMR value of 20.5167 using a MMR value of 17.4141 using a MMR value of 17.4141 has chr Number lines processed 0 has chr Number lines processed 0 has chr Number lines processed 0 has chr Number lines processed 0 has chr Number lines processed 0 has chr Number lines processed 0 has chr Number lines processed 0 has chr Number lines processed 0 100 100 100 100 100 100 200 100 100 200 200 200 200 200 200 300 300 300 200 300 300 300 400 300 400 400 300 400 400 400 500 500 500 400 400 500 500 500 600 600 500 500 600 600 600 700 700 700 600 700 700 800 800 800 900 900 700 800 800 900 1000 900 900 800 1100 1000 1000 900 1200 1100 1100 1000 1300 1200 1200 MAPPING TOOK 25 MINUTES BAM MAPPING COMPLETED NOW MAPPING DATA TO REGIONS FORMATTING TABLE GETTING MAPPED DATA GETTING MAPPING DATA FOR MM1S_MED1.hg18.bwt.sorted.bam OPENING example2/mappedGFF/HG18_MM1S_MED1_1000_12KB_STITCHED_TSS_DISTAL_MM1S_MED1.hg18.bwt.sorted.bam_MAPPED.gff MAKING SIGNAL DICT FOR MM1S_MED1.hg18.bwt.sorted.bam GETTING MAPPING DATA FOR MM1S_WCE.hg18.bwt.sorted.bam OPENING example2/mappedGFF/HG18_MM1S_MED1_1000_12KB_STITCHED_TSS_DISTAL_MM1S_WCE.hg18.bwt.sorted.bam_MAPPED.gff MAKING SIGNAL DICT FOR MM1S_WCE.hg18.bwt.sorted.bam CALLING AND PLOTTING SUPER-ENHANCERS R --no-save example2/ example2/HG18_MM1S_MED1_1000_12KB_STITCHED_TSS_DISTAL_ENHANCER_REGION_MAP.txt HG18_MM1S_MED1_1000 MM1S_WCE.hg18.bwt.sorted.bam < ROSE_callSuper.R ARGUMENT 'example2/' ignored

    ARGUMENT 'example2/HG18_MM1S_MED1_1000_12KB_STITCHED_TSS_DISTAL_ENHANCER_REGION_MAP.txt' ignored

    ARGUMENT 'HG18_MM1S_MED1_1000' ignored

    ARGUMENT 'MM1S_WCE.hg18.bwt.sorted.bam' ignored

    1400 1100

    R version 3.1.0 (2014-04-10) -- "Spring Dance" Copyright (C) 2014 The R Foundation for Statistical Computing Platform: x86_64-apple-darwin13.1.0 (64-bit)

    R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details.

    Natural language support but running in an English locale

    R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications.

    Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.

    #============================================================================ #==============SUPER-ENHANCER CALLING AND PLOTTING FUNCTIONS================= #============================================================================

    #This function calculates the cutoff by sliding a diagonal line and finding where it is tangential (or as close as possible) calculate_cutoff <- function(inputVector, drawPlot=TRUE,...){ +
    inputVector <- sort(inputVector) + inputVector[inputVector<0]<-0 #set those regions with more control than ranking equal to zero + slope <- (max(inputVector)-min(inputVector))/length(inputVector) #This is the slope of the line we want to slide. This is the diagonal. + xPt <- floor(optimize(numPts_below_line,lower=1,upper=length(inputVector),myVector= inputVector,slope=slope)$minimum) #Find the x-axis point where a line passing through that point has the minimum number of points below it. (ie. tangent) + y_cutoff <- inputVector[xPt] #The y-value at this x point. This is our cutoff. + + if(drawPlot){ #if TRUE, draw the plot + plot(1:length(inputVector), inputVector,type="l",...) + b <- y_cutoff-(slope* xPt) + abline(v= xPt,h= y_cutoff,lty=2,col=8) + points(xPt,y_cutoff,pch=16,cex=0.9,col=2) + abline(coef=c(b,slope),col=2) + title(paste("x=",xPt,"\ny=",signif(y_cutoff,3),"\nFold over Median=",signif(y_cutoff/median(inputVector),3),"x\nFold over Mean=",signif(y_cutoff/mean(inputVector),3),"x",sep="")) + axis(1,sum(inputVector==0),sum(inputVector==0),col.axis="pink",col="pink") #Number of regions with zero signal + } + return(list(absolute=y_cutoff,overMedian=y_cutoff/median(inputVector),overMean=y_cutoff/mean(inputVector))) + }

    #this is an accessory function, that determines the number of points below a diagnoal passing through [x,yPt] numPts_below_line <- function(myVector,slope,x){ + yPt <- myVector[x] + b <- yPt-(slopex) + xPts <- 1:length(myVector) + return(sum(myVector<=(xPtsslope+b))) + }

    convert_stitched_to_bed <- function(inputStitched,trackName,trackDescription,outputFile,splitSuper=TRUE,score=c(),superRows=c(),baseColor="0,0,0",superColor="255,0,0"){ + outMatrix <- matrix(data="",ncol=4+ifelse(length(score)==nrow(inputStitched),1,0),nrow=nrow(inputStitched)) + + outMatrix[,1] <- as.character(inputStitched$CHROM) + outMatrix[,2] <- as.character(inputStitched$START) + outMatrix[,3] <- as.character(inputStitched$STOP) + outMatrix[,4] <- as.character(inputStitched$REGION_ID) + if(length(score)==nrow(inputStitched)){ + score <- rank(score,ties.method="first") + score <- length(score)-score+1 #Stupid rank only does smallest to largest. + outMatrix[,5] <- as.character(score) + } + trackDescription <- paste(trackDescription,"\nCreated on ",format(Sys.time(), "%b %d %Y"),collapse="",sep="") + trackDescription <- gsub("\n","\t", trackDescription) + tName <- gsub(" ","",trackName) + cat('track name="', tName,'" description="', trackDescription,'" itemRGB=On color=',baseColor,"\n",sep="",file=outputFile) + write.table(file= outputFile,outMatrix,sep="\t",quote=FALSE,row.names=FALSE,col.names=FALSE,append=TRUE) + if(splitSuper==TRUE){ + cat("\ntrack name=\"Super", tName,'" description="Super ', trackDescription,'" itemRGB=On color=', superColor,"\n",sep="",file=outputFile,append=TRUE) + write.table(file= outputFile,outMatrix[superRows,],sep="\t",quote=FALSE,row.names=FALSE,col.names=FALSE,append=TRUE) + } + }

    convert_stitched_to_gateway_bed <- function(inputStitched,outputFileRoot,splitSuper=TRUE,score=c(),superRows=c()){ + outMatrix <- matrix(data="",ncol=6,nrow=nrow(inputStitched)) + + outMatrix[,1] <- as.character(inputStitched$CHROM) + outMatrix[,2] <- as.character(inputStitched$START) + outMatrix[,3] <- as.character(inputStitched$STOP) + outMatrix[,4] <- as.character(inputStitched$REGION_ID) + + if(length(score)==nrow(inputStitched)){ + score <- rank(score,ties.method="first") + score <- length(score)-score+1 #Stupid rank only does smallest to largest. + outMatrix[,5] <- as.character(score) + } + + outMatrix[,6] <- as.character(rep('.',nrow(outMatrix))) + + + outputFile1 = paste(outputFileRoot,'_Gateway_Enhancers.bed',sep='') + write.table(file= outputFile1,outMatrix,sep="\t",quote=FALSE,row.names=FALSE,col.names=FALSE,append=TRUE) + if(splitSuper==TRUE){ + outputFile2 = paste(outputFileRoot,'_Gateway_SuperEnhancers.bed',sep='') + + write.table(file= outputFile2,outMatrix[superRows,],sep="\t",quote=FALSE,row.names=FALSE,col.names=FALSE,append=TRUE) + } + }

    writeSuperEnhancer_table <- function(superEnhancer,description,outputFile,additionalData=NA){ + description <- paste("#",description,"\nCreated on ",format(Sys.time(), "%b %d %Y"),collapse="",sep="") + description <- gsub("\n","\n#",description) + cat(description,"\n",file=outputFile) + if(is.matrix(additionalData)){ + if(nrow(additionalData)!=nrow(superEnhancer)){ + warning("Additional data does not have the same number of rows as the number of super enhancers.\n--->>> ADDITIONAL DATA NOT INCLUDED <<<---\n") + }else{ + superEnhancer <- cbind(superEnhancer,additionalData) + superEnhancer = superEnhancer[order(superEnhancer$enhancerRank),] + + } + } + write.table(file=outputFile,superEnhancer,sep="\t",quote=FALSE,row.names=FALSE,append=TRUE) + }

    #============================================================================ #===================SUPER-ENHANCER CALLING AND PLOTTING====================== #============================================================================

    args <- commandArgs()

    print('THESE ARE THE ARGUMENTS') [1] "THESE ARE THE ARGUMENTS" print(args) [1] "/Library/Frameworks/R.framework/Resources/bin/exec/R"
    [2] "--no-save"
    [3] "example2/"
    [4] "example2/HG18_MM1S_MED1_1000_12KB_STITCHED_TSS_DISTAL_ENHANCER_REGION_MAP.txt" [5] "HG18_MM1S_MED1_1000"
    [6] "MM1S_WCE.hg18.bwt.sorted.bam"

    #ARGS outFolder = args[3] enhancerFile = args[4] enhancerName = args[5] wceName = args[6]

    #Read enhancer regions with closestGene columns stitched_regions <- read.delim(file= enhancerFile,sep="\t")

    #perform WCE subtraction. Using pipeline table to match samples to proper background. rankBy_factor = colnames(stitched_regions)[7]

    prefix = unlist(strsplit(rankBy_factor,'_'))[1]

    if(wceName == 'NONE'){ + + + rankBy_vector = as.numeric(stitched_regions[,7]) + + }else{ + wceName = colnames(stitched_regions)[8] + print('HERE IS THE WCE NAME') + print(wceName) + + rankBy_vector = as.numeric(stitched_regions[,7])-as.numeric(stitched_regions[,8]) + } [1] "HERE IS THE WCE NAME" [1] "MM1S_WCE.hg18.bwt.sorted.bam"

    #SETTING NEGATIVE VALUES IN THE rankBy_vector to 0

    rankBy_vector[rankBy_vector < 0] <- 0

    #FIGURING OUT THE CUTOFF

    cutoff_options <- calculate_cutoff(rankBy_vector, drawPlot=FALSE,xlab=paste(rankBy_factor,'_enhancers'),ylab=paste(rankByFactor,' Signal','- ',wceName),lwd=2,col=4)

    #These are the super-enhancers superEnhancerRows <- which(rankBy_vector> cutoff_options$absolute) typicalEnhancers = setdiff(1:nrow(stitched_regions),superEnhancerRows) enhancerDescription <- paste(enhancerName," Enhancers\nCreated from ", enhancerFile,"\nRanked by ",rankBy_factor,"\nUsing cutoff of ",cutoff_options$absolute," for Super-Enhancers",sep="",collapse="")

    #MAKING HOCKEY STICK PLOT plotFileName = paste(outFolder,enhancerName,'_Plot_points.png',sep='') png(filename=plotFileName,height=600,width=600) signalOrder = order(rankBy_vector,decreasing=TRUE) if(wceName == 'NONE'){ + plot(length(rankBy_vector):1,rankBy_vector[signalOrder], col='red',xlab=paste(rankBy_factor,'_enhancers'),ylab=paste(rankBy_factor,' Signal'),pch=19,cex=2) + + }else{ + plot(length(rankBy_vector):1,rankBy_vector[signalOrder], col='red',xlab=paste(rankBy_factor,'_enhancers'),ylab=paste(rankBy_factor,' Signal','- ',wceName),pch=19,cex=2) + } 1300 1300 1500 1400 1200 1400 1600 1500 1300 1500 1700 1600 abline(h=cutoff_options$absolute,col='grey',lty=2) abline(v=length(rankBy_vector)-length(superEnhancerRows),col='grey',lty=2) lines(length(rankBy_vector):1,rankBy_vector[signalOrder],lwd=4, col='red') text(0,0.8*max(rankBy_vector),paste(' Cutoff used: ',cutoff_options$absolute,'\n','Super-Enhancers identified: ',length(superEnhancerRows)),pos=4)

    dev.off() null device 1

    #Writing a bed file bedFileName = paste(outFolder,enhancerName,'_Enhancers_withSuper.bed',sep='') convert_stitched_to_bed(stitched_regions,paste(rankBy_factor,"Enhancers"), enhancerDescription,bedFileName,score=rankBy_vector,splitSuper=TRUE,superRows= superEnhancerRows,baseColor="0,0,0",superColor="255,0,0")

    bedFileRoot = paste(outFolder,enhancerName,sep='')

    convert_stitched_to_gateway_bed(stitched_regions,bedFileRoot,splitSuper=TRUE,score=rankBy_vector,superRows = superEnhancerRows)

    #This matrix is just the super_enhancers true_super_enhancers <- stitched_regions[superEnhancerRows,]

    additionalTableData <- matrix(data=NA,ncol=2,nrow=nrow(stitched_regions)) colnames(additionalTableData) <- c("enhancerRank","isSuper") additionalTableData[,1] <- nrow(stitched_regions)-rank(rankBy_vector,ties.method="first")+1 additionalTableData[,2] <- 0 additionalTableData[superEnhancerRows,2] <- 1

    #Writing enhancer and super-enhancer tables with enhancers ranked and super status annotated enhancerTableFile = paste(outFolder,enhancerName,'_AllEnhancers.table.txt',sep='') writeSuperEnhancer_table(stitched_regions, enhancerDescription,enhancerTableFile, additionalData= additionalTableData) Warning message: In write.table(file = outputFile, superEnhancer, sep = "\t", quote = FALSE, : appending column names to file

    superTableFile = paste(outFolder,enhancerName,'_SuperEnhancers.table.txt',sep='') writeSuperEnhancer_table(true_super_enhancers, enhancerDescription,superTableFile, additionalData= additionalTableData[superEnhancerRows,]) Warning message: In write.table(file = outputFile, superEnhancer, sep = "\t", quote = FALSE, : appending column names to file

    1400 Macintosh-5:ROSE royblum$ 1600 1800 1700 1500 1900 1700 1800 1600 2000 1800 1900 2100 1700 1900 2000 2200 1800 2000 2100 2300 2100 1900 2200 2400 2200 2300 2000 2500 2300 2400 2600 2100 2400 2500 2700 2200 2500 2600 2800 2300 2600 2900 2700 2400 3000 2700 2800 3100 2500 2900 2800 3200 3000 2600 2900 3300 3100 2700 3000 3400 3200 3100 2800 3500 3300 3200 2900 3600 3400 3300 3700 3000 3500 3400 3800 3600 3100 3900 3500 3200 3700 4000 3600 3800 3300 4100 3700 3900 3400 4200 3800 4000 4300 3500 3900 4100 4400 3600 4000 4200 4500 3700 4100 4300 4600 3800 4200 4400 4700 3900 4300 4500 4800 4000 4400 4600 4900 4500 4700 5000 4100 5100 4600 4800 4200 5200 4900 4700 4300 5300 5000 4800 4400 5400 5100 4900 5500 4500 5200 5000 5600 4600 5300 5100 5700 4700 5400 5200 5800 5300 5500 4800 5900 5400 5600 6000 4900

    Macintosh-5:ROSE royblum$ 5500 5700 6100 5000 6200 5600 5800 5100 6300 5700 5900 5200

  2. Brian Abraham

    Hi Roy,

    This looks to be the standard error output reporting the progress of bamToGFF. This is a counter to show how long the script has been running. If your output is created successfully, you don't need to worry about this.

  3. Log in to comment