Issue #263 resolved

If a datetime exclusion matches the first document, search always returns empty.

melinath
created an issue

This seems to be a problem with {{{InverseMatcher._find_next}}}; on {{{init}}}, {{{self._id}}} is set to 0; but if {{{child.id()}}} also returns 0 (the id of the first document) then the matching process is short-circuited and no results are returned. Here's a simple test case demonstrating the issue.

{{{ import datetime

from whoosh import fields, qparser from whoosh.filedb.filestore import RamStorage from nose.tools import assert_equal

u = unicode

def test_exclusion(): schema = fields.Schema(id=fields.ID(stored=True), date=fields.DATETIME) ix = RamStorage().create_index(schema) dt1 = datetime.datetime(1950, 1, 1) dt2 = datetime.datetime(1960, 1, 1) with ix.writer() as w: # Make 39 documents with dates != dt1 and then make a last document # with feed == dt1. for i in xrange(40): w.add_document(id=u(str(i)), date=(dt2 if i > 1 else dt1))

with ix.searcher() as s:
    qp = qparser.QueryParser("id", schema)
    # Find documents where date != dt1
    q = qp.parse("NOT (date:(19500101000000))")

    r = s.search(q, limit=None)
    assert_equal(len(r), 39)  # Total number of matched documents
    assert_equal(r.scored_length(), 39)  # Number of docs in the results

}}}

On my system, this test case fails with {{{AssertionError: 0 != 39}}}.

Comments (7)

  1. Matt Chaput repo owner

    InverseMatcher was inheriting bad behavior from WrappingMatcher.

    • WrappingMatcher.replace() assumed that if a child matcher was finished, the wrapping matcher was finished. This is obviously not true for InverseMatcher. Removed this completely, it was a bad idea in general. I think this was the specific cause of this issue.
    • InverseMatcher._copy() and _replacement() would reset the matcher because they didn't pass on the current ID. Don't know how this affected this issue but it was a bug.
    • InverseMatcher.all_ids() tried to be "fast" using set operations, but for large indexes it could have been very wasteful, and using completely different methodology for all_ids() masked problems with the "normal" iteration code. Now it just calls the base implementation.
  2. Thomas Waldmann
    • changed status to new

    If I understand the issue right, melinath is saying this is a general problem with InverseMatcher. So is it really related to DATETIME fields?

    Also, please note that xrange(x) gives 0 .. x-1, so if you evaluate i > 1, two values 0 and 1 give False, while 38 will give True.

    For the sake of a quick unit test, maybe just use 2 or 3 documents, if that is enough to show the problem.

  3. Log in to comment