RUN-ERROR
Dear ‘ROSE’ developers, Thanks for developing ‘ROSE’! It is of a great interest for us to be able to discover super enhancer. We have downloaded ROSE package and example data. We used example.sh to successfully got the output results of example data without any error. But when we replaced the example data with our test data, we got the message
Traceback (most recent call last): File "ROSE_main.py", line 496, in <module> main() File "ROSE_main.py", line 353, in main referenceCollection = ROSE_utils.gffToLocusCollection(inputGFFFile) File "/home/Downloads/ROSE_DATA/ROSE_utils.py", line 512, in gffToLocusCollection if len(line[1]) > 0: IndexError: list index out of range
This is the command line we used python ROSE_main.py -g HG19 -i ./data/test.gff -r ./data/test.sorted.bam -c ./data/control.sorted.bam -o test -s 12500 -t 2500
It seems to be an issue with the format of our input gff. But I checked the format carefully, and did not found any issue. Please find the attached file which is our input gff. We would be very appreciable if you could advise us what could be amend in our procedure in order to perform a successful ‘ROSE’ run.
Comments (8)
-
-
reporter Hi, Charles Thank you for your prompt reply. I did replace \r\n with \n, but I still got "IndexError: list index out of range" :(. Is there anything else should I consider? Best, Zhongyi
-
Zhongyi,
I can't quite explain it, but something about how you formatted that original text file is breaking our parser.
Please try this file and see if it works for you.
-
reporter Hi Charles, which tool do you recommend to process .gff file? Excel obviously is not a good choice. We used HOMER to process ChipSeq data, and convert the peaks file to gff format using Excel. Thanks.
-
Zhongyi,
The text processing is a little tricky. We prefer to write scripts in python to manually convert it. Gene pattern from the Broad has several tools to do this:
http://www.broadinstitute.org/cancer/software/genepattern/modules?taskType=Data+Format+Conversion
I am not sure if the fixed .gff file was attached here. Please email us directly at young_computation@wi.mit.edu and we can send it by email.
-Charles
-
Pushed a fix for this parsing problem. Added checks for GFF format compliance (line length). Will still bug if start/stop coordinates are not numeric or if the strand is not +,-,.
-
make sure you use the right version of python.
-
- changed status to resolved
- Log in to comment
Thank you for bringing this to our attention Zhongyi,
Was your .gff file processed in excel?
Each of the lines seems to have multiple end of line characters, and this is confusing our parser. I will update ROSE to fix this issue.
For instance, line one of your file is:
'chr1\tchr1-41961\t\t66791251\t66810398\t\t.\t\tchr1-41961\r\n'
In the meantime, if you end your lines with simply 'chr1\tchr1-41961\t\t66791251\t66810398\t\t.\t\tchr1-41961\n'
It should fix the code.
Best,
Charles
Charles Y. Lin, Ph.D. Dana-Farber Cancer Institute Department of Medical Oncology http://bradnerlab.com