Search index files not closed if filter is used (Python 3 only)

Issue #428 resolved
Chris Mutel
created an issue

Test script is here: https://gist.github.com/cmutel/1e02ff2f9ec53e06d902

Basically, the following has no problems:

with index.searcher() as searcher:
    searcher.search(qp.parse("foo"))

But the search index files are not closed if a filter is passed:

with index.searcher() as searcher:
    searcher.search(qp.parse("foo"), filter=Term("bar", "example"))

Buggy behavior has been seen on Linux (Python 3.5) and OS X 10.11 (Python 3.4), but not on Python 2.7.

The open files cause IOErrors relatively quickly if many searches are done.

Comments (4)

  1. Chris Mutel reporter

    So, after digging around for a while, I think the problem is the following (though I definitely don't understand everything):

    with index.searcher... creates a new index reader (link):

    Searcher(self.reader(), fromindex=self, **kwargs)
    

    .reader() calls ._reader() which in turn creates a SegmentReader (link).

    Even in our simple test cases, the passed segment is always compound, so OverlayStorage is created (link).

    The storage of OverlayStorage is CompoundStorage. mmap is available, and use_mmap is True, so a memory mapped file is opened (link), and the original file handle is closed. So far, everything is working as expected.

    However, a BufferError is always raised when the mmapped file is attempted to be closed, producing the following traceback:

    Traceback (most recent call last):
      File "/Users/cmutel/local34/whoosh-files/src/whoosh/src/whoosh/filedb/compound.py", line 116, in close
        self._source.close()
    BufferError: cannot close exported pointers exist
    

    So, someone somewhere is holding a pointer to this file object, and the C implementation of mmap won't allow close. That means that the next line del self._source (link) doesn't actually do anything to release the file, and the list of open files just grows and grows.

    I added a failing test here: https://bitbucket.org/cmutel/whoosh/commits/76f3cdd361b60be4a51d55f3fd7b5ed3416623f3

    Note also small bugfix here: https://bitbucket.org/cmutel/whoosh/commits/446494a356af3aebc037253328d7155248b5bd5a

    I am not 100% convinced that filters actually have anything to do with this bug, but this behaviour doesn't seem to appear when filters aren't present...

  2. Matt Chaput repo owner

    It wasn't the filter exactly, but when you use a filter, the searcher was caching the results of converting the query into a bitset of matching documents, and somehow that cache was preventing the Searcher object from being garbage collected. Very strage :(

  3. Log in to comment