bug in filestore?

Issue #16 resolved
Alexander Clausen
created an issue

Using whoosh 0.3.18 together with django-haystack, deployed on mod_wsgi. I'm getting errors when searching that look suspiciously like those when Whoosh was not thread safe:

{{{ Traceback (most recent call last):

File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/django/core/handlers/base.py", line 101, in get_response response = callback(request, callback_args, *callback_kwargs)

File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/haystack/views.py", line 131, in search_view return view_class(args, *kwargs)(request)

File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/haystack/views.py", line 45, in call return self.create_response()

File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/haystack/views.py", line 117, in create_response (paginator, page) = self.build_page()

File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/haystack/views.py", line 99, in build_page page = paginator.page(self.request.GET.get('page', 1))

File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/django/core/paginator.py", line 37, in page number = self.validate_number(number)

File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/django/core/paginator.py", line 28, in validate_number if number > self.num_pages:

File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/django/core/paginator.py", line 60, in _get_num_pages if self.count == 0 and not self.allow_empty_first_page:

File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/django/core/paginator.py", line 48, in _get_count self._count = self.object_list.count()

File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/haystack/query.py", line 377, in count return len(clone)

File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/haystack/query.py", line 53, in len self._result_count = self.query.get_count()

File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/haystack/backends/init.py", line 408, in get_count self.run()

File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/haystack/backends/init.py", line 363, in run results = self.backend.search(final_query, **kwargs)

File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/haystack/backends/init.py", line 52, in wrapper return func(obj, query_string, args, *kwargs)

File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/haystack/backends/whoosh_backend.py", line 298, in search narrow_searcher = self.index.searcher()

File "build/bdist.linux-x86_64/egg/whoosh/index.py", line 329, in searcher return Searcher(self.reader(), **kwargs)

File "build/bdist.linux-x86_64/egg/whoosh/filedb/fileindex.py", line 291, in reader return self.segments.reader(self.storage, self.schema)

File "build/bdist.linux-x86_64/egg/whoosh/filedb/fileindex.py", line 422, in reader for segment in segments]

File "build/bdist.linux-x86_64/egg/whoosh/filedb/filereading.py", line 73, in init self.termtable = open_terms(storage, segment)

File "build/bdist.linux-x86_64/egg/whoosh/filedb/filereading.py", line 34, in open_terms termfile = storage.open_file(segment.term_filename)

File "build/bdist.linux-x86_64/egg/whoosh/filedb/filestore.py", line 56, in open_file f = StructFile(open(self._fpath(name), "rb"), args, *kwargs)

IOError: [Errno 2] No such file or directory: u'/usr/local/pythonenv/flussinfo/share/flussinfo/whoosh_index/_MAIN_7.tiz' }}}

and yes, they seem to go away when switching to threads=1 in the WSGIDaemonProcess. Strangely the site worked fine for almost a month with threads enabled.

Comments (4)

  1. cdent
    • removed component

    I get errors like this as well. Something is maintaining a list of files in the directory while something else is changing the files that are actually in the directory. In my situation I have an external process listening on a message queue which is performing indexing. WSGI applications (using mod_wsgi with processes=2, threads=10) are performing searches on the same index.

    I can only get the bug to rear its head when:

    • I'm actively indexing a lot of documents (for example, when I send the titles of all the documents down the message queue).
    • I'm making a lot of searches in a big hurry (looping over the same query from curl 100 times I'll get the error approximately twice).

    I've been intending to come up with a minimal test case for this, but it is somewhat more difficult than I had hoped to create the right conditions that model the conditions that seem to cause the problem.

    The code is assembled from pieces hosted on github:

    I'd really like to help fix this, but I'm not sure what to provide. I can put a fair few hours into the effort if needed, just need to be pushed in the right direction.


  2. Log in to comment