1. Matt Chaput
  2. whoosh
  3. Issues
Issue #121 resolved

FilePostingReader/IntersectionMatcher IndexError exception

Matt Chaput
repo owner created an issue

See http://groups.google.com/group/whoosh/browse_thread/thread/e55589d7ba9c642a

I am using whoosh 1.7.6. I have a fairly large index, 2+ million entries, ~250MB. I have one particular search that fails with an IndexError, see below.

The issue is happening in the IntersectionMatcher here:

{{{ while a.is_active() and b.is_active() and aq + bq <= minquality: if aq < bq: skipped += a.skip_to_quality(minquality - bq) else: skipped += b.skip_to_quality(minquality - aq) if a.id() != b.id(): self._find_next() aq = a.block_quality() bq = b.block_quality() }}}

The problem is that the b.skip_to_quality() call is reading to the end of the blocks trying to find a better quality (I guess?). b is set not active, and then the call to b.id() fails with the index out of range issue. I assume there is some underlying issue here. I tried changing the line:

{{{ if a.id() != b.id(): }}}

to:

{{{ if a.is_active() and b.is_active() and a.id() != b.id(): }}}

which eliminates the exception. This may be the solution, but I am having another issue with ANDMAYBE and search limits that is masking it.

{{{ /Library/Python/2.6/site-packages/Whoosh-1.7.6-py2.6.egg/whoosh/searching.p yc in search(self, q, limit, sortedby, reverse, groupedby, optimize, scored, filter, collector) 481 collector.scored = scored 482 --> 483 return collector.search(self, q, filter=filter) 484 485 /Library/Python/2.6/site-packages/Whoosh-1.7.6-py2.6.egg/whoosh/searching.p yc in search(self, searcher, q, filter) 582 self.add_searcher(s, q) 583 else: --> 584 self.add_searcher(searcher, q) 585 586 if self.timer: /Library/Python/2.6/site-packages/Whoosh-1.7.6-py2.6.egg/whoosh/searching.p yc in add_searcher(self, searcher, q) 608 """ 609 --> 610 self.add_matches(searcher, q.matcher(searcher)) 611 612 def score(self, searcher, matcher): /Library/Python/2.6/site-packages/Whoosh-1.7.6-py2.6.egg/whoosh/searching.p yc in add_matches(self, searcher, matcher) 653 return self.add_all_matches(searcher, matcher) 654 else: --> 655 return self.add_top_matches(searcher, matcher) 656 657 def add_top_matches(self, searcher, matcher): /Library/Python/2.6/site-packages/Whoosh-1.7.6-py2.6.egg/whoosh/searching.p yc in add_top_matches(self, searcher, matcher) 669 greedy = self.greedy 670 --> 671 for id, quality in self.pull_matches(matcher, usequality): 672 if timelimited and not greedy and self.timesup: 673 raise TimeLimit /Library/Python/2.6/site-packages/Whoosh-1.7.6-py2.6.egg/whoosh/searching.p yc in pull_matches(self, matcher, usequality) 759 # required quality 760 if usequality and checkquality and self.minquality is not None: --> 761 matcher.skip_to_quality(self.minquality) 762 # Skipping ahead might have moved the matcher to the end of the 763 # posting list /Library/Python/2.6/site-packages/Whoosh-1.7.6-py2.6.egg/whoosh/matching.py c in skip_to_quality(self, minquality) 981 else: 982 skipped += b.skip_to_quality(minquality - aq) --> 983 if a.id() != b.id(): 984 self._find_next() 985 aq = a.block_quality() /Library/Python/2.6/site-packages/Whoosh-1.7.6-py2.6.egg/whoosh/filedb/file postings.pyc in id(self) 150 151 def id(self): --> 152 return self.block.ids[self.i] 153 154 def items_as(self, astype): IndexError: array index out of range }}}

Comments (1)

  1. Log in to comment