IntervalTree.find() is broken and/or unintuitive

Issue #53 invalid
Hussein Elgridly
created an issue

For an IntervalTree containing Interval(a,b):

  • find(a,b) is not consistent
  • find(a, a+1) is not consistent
  • find(a, b+1) is not consistent
  • find(a, a) and find(b, b) consistently return []
  • find(a-1, b+1) consistently finds Interval(a,b)

The problematic cases are b=a and b=a+1:

>>> from bx.intervals.intersection import IntervalTree, Interval
>>>
>>> tree = IntervalTree()
>>> tree.insert_interval( Interval(1,1) )
>>> tree.find(1,1)
[]
>>> tree.find(1,2)
[]
>>> tree.find(0,2)
[Interval(1, 1)]

>>> tree.insert_interval( Interval(10,11) )
>>> tree.find(10, 11)
[Interval(10, 11)]
>>> tree.find(10, 12)
[Interval(10, 11)]
>>> tree.find(10, 10)
[]
>>> tree.find(11, 11)
[]

>>> tree.insert_interval( Interval(20, 22) )
>>> tree.find(20, 22)
[Interval(20, 22)]
>>> tree.find(20, 21)
[Interval(20, 22)]
>>> tree.find(20, 23)
[Interval(20, 22)]
>>> tree.find(20, 20)
[]
>>> tree.find(22, 22)
[]

The Samtools IntervalList spec states that intervals are end-inclusive: https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/samtools/util/IntervalList.html

Changing intersection.pyx line 184 to:
if ( self.end >= start ) and ( self.start =< end ):
would restore consistency across the above examples and bring bx-python in line with other tools.

Thoughts?

Comments (7)

  1. Brent Pedersen

    You've shown that it doesn't handle 0-length intervals.

    Do all tests pass with the change you made? I think it'd need quite a few tests with 0-length intervals and sets of intervals like (10, 10), (11, 11), (10, 11).

  2. Hussein Elgridly reporter

    No tests failed that weren't already failing. However, running nosetests --exe gave me the following, even before making my change:

    bx.binned_array_tests.test_file_lzo ... ERROR
    bx.binned_array_tests.test_binned_array_writer ... ERROR
    test_DNA (bx.seqmapping_tests.CharMappingTests) ... FAIL
    test_DNA_list (bx.seqmapping_tests.CharMappingTests) ... FAIL
    test_other (bx.seqmapping_tests.CharMappingTests) ... FAIL
    test_simple (bx.seqmapping_tests.IntMappingTests) ... FAIL
    test_simple (bx.seqmapping_tests.IntMappingTests) ... FAIL
    Failure: TypeError (must be string without null bytes, not str) ... ERROR
    
  3. Hussein Elgridly reporter

    Please close this issue. I have end-inclusive intervals, and I assumed that I'd be able to just dump them into the tree without converting them to end-exclusive. Doing so beforehand makes everything work as expected.

  4. Log in to comment