Issue #124 new

AndMaybeMatcher IndexError in quality

Anonymous created an issue

This appears to be a regression from commit 9859aa6f4bb2. The original inherited quality from AdditiveBiMatcher checks first that a and b are active before checking the ids:

{{{

!python

def quality(self):
    q = 0.0
    if self.a.is_active():
        q += self.a.quality()
    if self.b.is_active():
        q += self.b.quality()
    return q

}}}

The updated AndMaybeMatcher quality function does not check for a and b active, and may result in an IndexError as seen further below:

{{{

!python

def quality(self):
    if self.a.id() == self.b.id():
        return self.a.quality() + self.b.quality()
    else:
        return self.a.quality()

}}}

Also related; the AndMaybeMatcher is_active() function only returns the value of a.is_active(). The IntersectionMatcher returns a.is_active() and b.is_active(), should AndMaybeMatcher be the same?

There are a lot of locations where Matcher objects id() is called and therefor a number of possibilities for this same IndexError to crop up. I don't really understand how the internals work, but based on this problem, it seems like the id() call either needs to be safer, or some more is_active() calls need to be added. For AndMaybeMatcher in id(), quality(), weight() and score() for example.

{{{

!python


IndexError Traceback (most recent call last)

/Users/jasl8r/Development/sequoia/sequoia/<ipython console> in <module>()

/Users/jasl8r/Development/sequoia/sequoia/sequoia/data.py in match_media(self) 136 query += u' ANDMAYBE year:%d' % show.get('year') 137 --> 138 for result in searcher.search(qparser.parse(query), limit = 1): 139 matches[series_id] = result['id'] 140

/Library/Python/2.6/site-packages/Whoosh-1.7.7-py2.6.egg/whoosh/searching.pyc in search(self, q, limit, sortedby, reverse, groupedby, optimize, scored, filter, collector) 572 collector.reverse = reverse 573 --> 574 return collector.search(self, q, filter=filter) 575 576

/Library/Python/2.6/site-packages/Whoosh-1.7.7-py2.6.egg/whoosh/searching.pyc in search(self, searcher, q, filter) 682 break 683 else: --> 684 self.add_searcher(searcher, q) 685 686 if self.timer:

/Library/Python/2.6/site-packages/Whoosh-1.7.7-py2.6.egg/whoosh/searching.pyc in add_searcher(self, searcher, q) 708 """ 709 --> 710 return self.add_matches(searcher, q.matcher(searcher)) 711 712 def score(self, searcher, matcher):

/Library/Python/2.6/site-packages/Whoosh-1.7.7-py2.6.egg/whoosh/searching.pyc in add_matches(self, searcher, matcher) 750 return self.add_all_matches(searcher, matcher) 751 else: --> 752 return self.add_top_matches(searcher, matcher) 753 754 def add_top_matches(self, searcher, matcher):

/Library/Python/2.6/site-packages/Whoosh-1.7.7-py2.6.egg/whoosh/searching.pyc in add_top_matches(self, searcher, matcher) 766 greedy = self.greedy 767 --> 768 for id, quality in self.pull_matches(matcher, usequality): 769 if timelimited and not greedy and self.timesup: 770 raise TimeLimit

/Library/Python/2.6/site-packages/Whoosh-1.7.7-py2.6.egg/whoosh/searching.pyc in pull_matches(self, matcher, usequality) 881 # posting has higher quality than the minimum before yielding it.

882             if usequality:

--> 883 postingquality = matcher.quality() 884 if postingquality > self.minquality: 885 yield (id, postingquality)

/Library/Python/2.6/site-packages/Whoosh-1.7.7-py2.6.egg/whoosh/matching.pyc in quality(self) 1299 1300 def quality(self): -> 1301 if self.a.id() == self.b.id(): 1302 return self.a.quality() + self.b.quality() 1303 else:

/Library/Python/2.6/site-packages/Whoosh-1.7.7-py2.6.egg/whoosh/filedb/filepostings.pyc in id(self) 150 151 def id(self): --> 152 return self.block.ids[self.i] 153 154 def items_as(self, astype):

IndexError: array index out of range

}}}

Comments (4)

  1. Matt Chaput repo owner

    AndMaybe.is_active() only returns the value of a.is_active() because the "b" matcher is optional... even if it's inactive, the AndMaybe matcher can still be active.

    In theory most of the "informational" methods on a matcher (e.g. quality(), weight(), value(), etc.) should not need to check the active-ness of the matcher itself (they may need to check the active-ness of child matchers if it affects the return value), because by convention the caller is expected to only call those methods on an active matcher.

    I think possibly the Collector object is not honoring that agreement and is calling quality() on a matcher without making sure the matcher is active. Putting the check in the quality() method is a defensive measure while I figure that out. I also need a lot more unit tests for the matchers.

    Thanks!

  2. Andi Albrecht

    Could it be that I trap into this issue only with certain keywords? Does this matter anyhow? I can search without issues with or without results, but for some keywords and word variants of them I constantly see an exception that looks like this one.

    If so, is there a workaround?

  3. Log in to comment