1. Matt Chaput
  2. whoosh
  3. Issues
Issue #361 resolved

UnicodeDecodeError when trying to print a parsed query with a numerical field

Thomas O'Donnell
created an issue

If you try and print the unicode version of a parsed query that contains a NUMERIC field it throws a UnicodeDecodeError Exception.

If you try and run the following test script

from whoosh import fields
from whoosh import qparser

schema = fields.Schema(num=fields.NUMERIC)
parser = qparser.QueryParser(u"num", schema=schema)
query = parser.parse(u"num:1", debug=True)

print unicode(query)

it returns the following output.

Taggers: [<OpTagger '(?<=\\s)ANDNOT(?=\\s)' (anot)>, <OpTagger '(?<=\\s)ANDMAYBE(?=\\s)' (amaybe)>, <whoosh.qparser.plugins.EveryPlugin object at 0x221b590>, <whoosh.qparser.plugins.SingleQuotePlugin object at 0x220d890>, <whoosh.qparser.plugins.FieldnameTagger object at 0x221b5f0>, <whoosh.qparser.plugins.WildcardPlugin object at 0x220dd50>, <whoosh.qparser.plugins.PhraseTagger object at 0x221b630>, <FnTagger <_sre.SRE_Pattern object at 0xb683e1a0> (openB)>, <FnTagger <_sre.SRE_Pattern object at 0xb683e200> (closeB)>, <OpTagger '(^|(?<=(\\s|[()])))NOT(?=\\s)' (not)>, <OpTagger '(?<=\\s)AND(?=\\s)' (and)>, <OpTagger '(?<=\\s)OR(?=\\s)' (or)>, <OpTagger '(^|(?<=\\s))REQUIRE(?=\\s)' (req)>, <whoosh.qparser.plugins.BoostPlugin object at 0x2211b50>, <whoosh.qparser.plugins.RangeTagger object at 0x221b690>, <whoosh.qparser.plugins.WhitespacePlugin object at 0x221b570>, <whoosh.qparser.plugins.WhitespacePlugin object at 0x220d8b0>]
Tagger: <whoosh.qparser.plugins.FieldnameTagger object at 0x221b5f0> at 0: <u'num':>
Tagged group: <AndGroup <u'num':>, <None:u'1'>>
Pre-filtered group: <AndGroup <u'num':>, <None:u'1'>>
..Applying: <bound method GroupPlugin.do_groups of <whoosh.qparser.plugins.GroupPlugin object at 0x2211250>>
..Result: <AndGroup <u'num':>, <None:u'1'>>
..Applying: <bound method BoostPlugin.clean_boost of <whoosh.qparser.plugins.BoostPlugin object at 0x2211b50>>
..Result: <AndGroup <u'num':>, <None:u'1'>>
..Applying: <bound method WildcardPlugin.do_wildcards of <whoosh.qparser.plugins.WildcardPlugin object at 0x220dd50>>
..Result: <AndGroup <u'num':>, <None:u'1'>>
..Applying: <bound method FieldsPlugin.do_fieldnames of <whoosh.qparser.plugins.FieldsPlugin object at 0x220dd70>>
..Result: <AndGroup <u'num':u'1'>>
..Applying: <bound method WhitespacePlugin.remove_whitespace of <whoosh.qparser.plugins.WhitespacePlugin object at 0x221b570>>
..Result: <AndGroup <u'num':u'1'>>
..Applying: <bound method WhitespacePlugin.remove_whitespace of <whoosh.qparser.plugins.WhitespacePlugin object at 0x220d8b0>>
..Result: <AndGroup <u'num':u'1'>>
..Applying: <bound method OperatorsPlugin.do_operators of <whoosh.qparser.plugins.OperatorsPlugin object at 0x2211430>>
..Result: <AndGroup <u'num':u'1'>>
..Applying: <bound method BoostPlugin.do_boost of <whoosh.qparser.plugins.BoostPlugin object at 0x2211b50>>
..Result: <AndGroup <u'num':u'1'>>
Syntax tree: <AndGroup <u'num':u'1'>>
Pre-normalized query: And([Term(u'num', '\x00\x80\x00\x00\x01')])
Normalized query: Term(u'num', '\x00\x80\x00\x00\x01')
Traceback (most recent call last):
  File "test_unicode.py", line 8, in <module>
    print unicode(query)
  File "/home/andy/.virtualenvs/searchr/local/lib/python2.7/site-packages/whoosh/query/terms.py", line 70, in __unicode__
    t = u("%s:%s") % (self.fieldname, self.text)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 1: ordinal not in range(128)

Comments (4)

  1. Log in to comment