UnicodeDecodeError on search

Issue #441 new
Michal Čihař created an issue

On Python 3, I've seen some kind of index corruption. It seems that whoosh under some conditions writes content which is then not able to parse:

File "/usr/local/lib/python3.5/site-packages/whoosh/searching.py" in search
  786.         self.search_with_collector(q, c)

File "/usr/local/lib/python3.5/site-packages/whoosh/searching.py" in search_with_collector
  819.         collector.run()

File "/usr/local/lib/python3.5/site-packages/whoosh/collectors.py" in run
  144.                 self.collect_matches()

File "/usr/local/lib/python3.5/site-packages/whoosh/collectors.py" in collect_matches
  214.         for sub_docnum in self.matches():

File "/usr/local/lib/python3.5/site-packages/whoosh/collectors.py" in matches
  415.             yield matcher.id()

File "/usr/local/lib/python3.5/site-packages/whoosh/codec/whoosh3.py" in id
  980.             self._read_ids()

File "/usr/local/lib/python3.5/site-packages/whoosh/codec/whoosh3.py" in _read_ids
  1082.             self._read_data()

File "/usr/local/lib/python3.5/site-packages/whoosh/codec/whoosh3.py" in _read_data
  1077.         self._data = loads(b)

Exception Type: UnicodeDecodeError at /translate/sites/xxxx/es/
Exception Value: 'ascii' codec can't decode byte 0x80 in position 4: ordinal not in range(128)

Originally reported on Weblate, but I think this is bug in Whoosh. The original report is here https://github.com/nijel/weblate/issues/1075

