RUN-ERROR

Issue #10 resolved
Zhongyi Hu created an issue

Dear ‘ROSE’ developers, Thanks for developing ‘ROSE’! It is of a great interest for us to be able to discover super enhancer. We have downloaded ROSE package and example data. We used example.sh to successfully got the output results of example data without any error. But when we replaced the example data with our test data, we got the message

Traceback (most recent call last): File "ROSE_main.py", line 496, in <module> main() File "ROSE_main.py", line 353, in main referenceCollection = ROSE_utils.gffToLocusCollection(inputGFFFile) File "/home/Downloads/ROSE_DATA/ROSE_utils.py", line 512, in gffToLocusCollection if len(line[1]) > 0: IndexError: list index out of range

This is the command line we used python ROSE_main.py -g HG19 -i ./data/test.gff -r ./data/test.sorted.bam -c ./data/control.sorted.bam -o test -s 12500 -t 2500

It seems to be an issue with the format of our input gff. But I checked the format carefully, and did not found any issue. Please find the attached file which is our input gff. We would be very appreciable if you could advise us what could be amend in our procedure in order to perform a successful ‘ROSE’ run.

Comments (8)

  1. charles_lin

    Thank you for bringing this to our attention Zhongyi,

    Was your .gff file processed in excel?

    Each of the lines seems to have multiple end of line characters, and this is confusing our parser. I will update ROSE to fix this issue.

    For instance, line one of your file is:

    'chr1\tchr1-41961\t\t66791251\t66810398\t\t.\t\tchr1-41961\r\n'

    In the meantime, if you end your lines with simply 'chr1\tchr1-41961\t\t66791251\t66810398\t\t.\t\tchr1-41961\n'

    It should fix the code.

    Best,

    Charles

    Charles Y. Lin, Ph.D. Dana-Farber Cancer Institute Department of Medical Oncology http://bradnerlab.com

  2. Zhongyi Hu reporter

    Hi, Charles Thank you for your prompt reply. I did replace \r\n with \n, but I still got "IndexError: list index out of range" :(. Is there anything else should I consider? Best, Zhongyi

  3. charles_lin

    Zhongyi,

    I can't quite explain it, but something about how you formatted that original text file is breaking our parser.

    Please try this file and see if it works for you.

  4. Zhongyi Hu reporter

    Hi Charles, which tool do you recommend to process .gff file? Excel obviously is not a good choice. We used HOMER to process ChipSeq data, and convert the peaks file to gff format using Excel. Thanks.

  5. chazlin

    Pushed a fix for this parsing problem. Added checks for GFF format compliance (line length). Will still bug if start/stop coordinates are not numeric or if the strand is not +,-,.

  6. Log in to comment