AndMaybeMatcher IndexError in quality

Anonymous avatarAnonymous created an issue

This appears to be a regression from commit 9859aa6f4bb2. The original inherited quality from AdditiveBiMatcher checks first that a and b are active before checking the ids:

    def quality(self):
        q = 0.0
        if self.a.is_active():
            q += self.a.quality()
        if self.b.is_active():
            q += self.b.quality()
        return q

The updated AndMaybeMatcher quality function does not check for a and b active, and may result in an IndexError as seen further below:

    def quality(self):
        if self.a.id() == self.b.id():
            return self.a.quality() + self.b.quality()
        else:
            return self.a.quality()

Also related; the AndMaybeMatcher is_active() function only returns the value of a.is_active(). The IntersectionMatcher returns a.is_active() and b.is_active(), should AndMaybeMatcher be the same?

There are a lot of locations where Matcher objects id() is called and therefor a number of possibilities for this same IndexError to crop up. I don't really understand how the internals work, but based on this problem, it seems like the id() call either needs to be safer, or some more is_active() calls need to be added. For AndMaybeMatcher in id(), quality(), weight() and score() for example.

---------------------------------------------------------------------------                                                                                                                         
IndexError                                Traceback (most recent call last)

/Users/jasl8r/Development/sequoia/sequoia/<ipython console> in <module>()

/Users/jasl8r/Development/sequoia/sequoia/sequoia/data.py in match_media(self)
    136                 query += u' ANDMAYBE year:%d' % show.get('year')
    137 
--> 138             for result in searcher.search(qparser.parse(query), limit = 1):
    139                 matches[series_id] = result['id']
    140 

/Library/Python/2.6/site-packages/Whoosh-1.7.7-py2.6.egg/whoosh/searching.pyc in search(self, q, limit, sortedby, reverse, groupedby, optimize, scored, filter, collector)
    572             collector.reverse = reverse
    573 
--> 574         return collector.search(self, q, filter=filter)
    575 
    576 

/Library/Python/2.6/site-packages/Whoosh-1.7.7-py2.6.egg/whoosh/searching.pyc in search(self, searcher, q, filter)
    682                     break
    683         else:
--> 684             self.add_searcher(searcher, q)
    685 
    686         if self.timer:

/Library/Python/2.6/site-packages/Whoosh-1.7.7-py2.6.egg/whoosh/searching.pyc in add_searcher(self, searcher, q)
    708         """
    709 
--> 710         return self.add_matches(searcher, q.matcher(searcher))
    711 
    712     def score(self, searcher, matcher):

/Library/Python/2.6/site-packages/Whoosh-1.7.7-py2.6.egg/whoosh/searching.pyc in add_matches(self, searcher, matcher)
    750             return self.add_all_matches(searcher, matcher)
    751         else:
--> 752             return self.add_top_matches(searcher, matcher)
    753 
    754     def add_top_matches(self, searcher, matcher):

/Library/Python/2.6/site-packages/Whoosh-1.7.7-py2.6.egg/whoosh/searching.pyc in add_top_matches(self, searcher, matcher)
    766         greedy = self.greedy
    767 
--> 768         for id, quality in self.pull_matches(matcher, usequality):
    769             if timelimited and not greedy and self.timesup:
    770                 raise TimeLimit

/Library/Python/2.6/site-packages/Whoosh-1.7.7-py2.6.egg/whoosh/searching.pyc in pull_matches(self, matcher, usequality)
    881             # posting has higher quality than the minimum before yielding it.

    882             if usequality:
--> 883                 postingquality = matcher.quality()
    884                 if postingquality > self.minquality:
    885                     yield (id, postingquality)

/Library/Python/2.6/site-packages/Whoosh-1.7.7-py2.6.egg/whoosh/matching.pyc in quality(self)
   1299 
   1300     def quality(self):
-> 1301         if self.a.id() == self.b.id():
   1302             return self.a.quality() + self.b.quality()
   1303         else:

/Library/Python/2.6/site-packages/Whoosh-1.7.7-py2.6.egg/whoosh/filedb/filepostings.pyc in id(self)
    150 
    151     def id(self):
--> 152         return self.block.ids[self.i]
    153 
    154     def items_as(self, astype):

IndexError: array index out of range

Comments (4)

  1. Matt Chaput

    AndMaybe.is_active() only returns the value of a.is_active() because the "b" matcher is optional... even if it's inactive, the AndMaybe matcher can still be active.

    In theory most of the "informational" methods on a matcher (e.g. quality(), weight(), value(), etc.) should not need to check the active-ness of the matcher itself (they may need to check the active-ness of child matchers if it affects the return value), because by convention the caller is expected to only call those methods on an active matcher.

    I think possibly the Collector object is not honoring that agreement and is calling quality() on a matcher without making sure the matcher is active. Putting the check in the quality() method is a defensive measure while I figure that out. I also need a lot more unit tests for the matchers.

    Thanks!

  2. Andi Albrecht

    Could it be that I trap into this issue only with certain keywords? Does this matter anyhow? I can search without issues with or without results, but for some keywords and word variants of them I constantly see an exception that looks like this one.

    If so, is there a workaround?

  3. Log in to comment
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.