ascii codec error when performing query using python 3.2

Issue #395 resolved
Anonymous created an issue

While running a query from python 3.2 this error occurs while running the line

results = searcher.search(query)

if this is an error on my part I appologize,

The traceback is below,

Thank you

Exception happened during processing of request from ('127.0.0.1', 36235)

Traceback (most recent call last):

File "/big/thtownsend/dev/install/rms/linux-amd64-gcc_4_4-debug/lib/python3.2/socketserver.py", line 581, in process_request_thread self.finish_request(request, client_address)

File "/big/thtownsend/dev/install/rms/linux-amd64-gcc_4_4-debug/lib/python3.2/socketserver.py", line 323, in finish_request self.RequestHandlerClass(request, client_address, self)

File "server.py", line 19, in init RequestHandler.init(self,request,client_address, server)

File "/big/thtownsend/dev/install/rms/linux-amd64-gcc_4_4-debug/lib/python3.2/socketserver.py", line 637, in init self.handle()

File "/big/thtownsend/dev/install/rms/linux-amd64-gcc_4_4-debug/lib/python3.2/http/server.py", line 396, in handle self.handle_one_request()

File "/big/thtownsend/dev/install/rms/linux-amd64-gcc_4_4-debug/lib/python3.2/http/server.py", line 384, in handle_one_request method()

File "server.py", line 39, in do_GET rtn = indexer.search(str(query_components['q'][0]))

File "/big/thtownsend/dev/install/rms/resource/helpfiles/indexer.py", line 21, in search results = searcher.search(query)

File "/big/thtownsend/dev/install/rms/linux-amd64-gcc_4_4-debug/lib/python3.2/site-packages/whoosh/searching.py", line 787, in search self.search_with_collector(q, c)

File "/big/thtownsend/dev/install/rms/linux-amd64-gcc_4_4-debug/lib/python3.2/site-packages/whoosh/searching.py", line 820, in search_with_collector collector.run()

File "/big/thtownsend/dev/install/rms/linux-amd64-gcc_4_4-debug/lib/python3.2/site-packages/whoosh/collectors.py", line 144, in run self.collect_matches()

File "/big/thtownsend/dev/install/rms/linux-amd64-gcc_4_4-debug/lib/python3.2/site-packages/whoosh/collectors.py", line 214, in collect_matches for sub_docnum in self.matches():

File "/big/thtownsend/dev/install/rms/linux-amd64-gcc_4_4-debug/lib/python3.2/site-packages/whoosh/collectors.py", line 415, in matches yield matcher.id()

File "/big/thtownsend/dev/install/rms/linux-amd64-gcc_4_4-debug/lib/python3.2/site-packages/whoosh/codec/whoosh3.py", line 910, in id self._read_ids()

File "/big/thtownsend/dev/install/rms/linux-amd64-gcc_4_4-debug/lib/python3.2/site-packages/whoosh/codec/whoosh3.py", line 1012, in _read_ids self._read_data()

File "/big/thtownsend/dev/install/rms/linux-amd64-gcc_4_4-debug/lib/python3.2/site-packages/whoosh/codec/whoosh3.py", line 1007, in _read_data self._data = loads(b)

UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 4: ordinal not in range(128)

Comments (5)

  1. Thomas Townsend

    this is the entire code for running the query

    schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT)
    ix = open_dir('/big/thtownsend/dev/install/rms/resource/helpfiles/_static')
    
    from whoosh.qparser import QueryParser
    def search(q):
        with ix.searcher() as searcher:
            print(q)
            query = QueryParser("content", ix.schema).parse(str(q), debug=True)
            results = searcher.search(query)
            rtn = []
            rtn.append(len(results))
            tmp = ''
            for r in results:
                tmp += r+'\n'
            rtn.append(tmp)
    
  2. superkelvint

    I had this same error on Python 3.4 and made it go away by rebuilding my index. I think my original index was built on 2.7 and ran into this error when running it on 3.4. I wonder if you're in the same boat? If so, try rebuilding your index on 3.x?

  3. Matt Chaput repo owner

    To reindex, you need to delete the existing index and somehow (depending on your code) add all the existing documents to the index again. Sorry about the trouble!

  4. Garrett Smith

    I think it's a noble goal that Whoosh support interoperability between Python 2 and Python - i.e. it should be able to read/write indexes in Python 3 that have been written from Python 2, and vise versa. I don't think re-indexing because someone changes Python versions (within the set of supported versions) is a good long term plan.

    I suspect the problem described by the OP is this one:

    https://bugs.python.org/issue22005

    In summary, Python 3's loads is choking on a datetime object saved in Python 2's dumps. As this issue has been known since 2015. I can reproduce this trivially by saving a datetime in Python 2.7 and trying to load it in Python 3.

    Specifying encoding="bytes" (as mentioned in the bug report) does solve the problem with datetime, but it causes strings to be loaded as bytes, so it's not as simple as a one line fix.

    I'm adding this here for the record, in case someone else runs into this problem. I think it'd be helpful to have a section in the docs that talked about Python 2/3 interoperability, but that's a nice to have. This is apparently an edge case for people :)

    For my part, I can't ask users to reindex across versions, so I'll switch to using NUMERIC(int64) vals for datetime. There may be other issues I run into as well, which I'll update here for the record as well!

  5. Log in to comment