Parsed query with a numerical field seem to get messed up

Issue #363 new
Thomas O'Donnell
created an issue

Following the fix to #361. It seems like Numerical queries are getting messed up.

from whoosh import fields
from whoosh import qparser

schema = fields.Schema(num=fields.NUMERIC)
parser = qparser.QueryParser(u"num", schema=schema)
raw_query = u"num:1"
query = parser.parse(raw_query, debug=True)

print "Raw:", raw_query
print "Parsed:", unicode(query)

When run this produces the following output.

Taggers: [<OpTagger '(?<=\\s)ANDNOT(?=\\s)' (anot)>, <OpTagger '(?<=\\s)ANDMAYBE(?=\\s)' (amaybe)>, <whoosh.qparser.plugins.EveryPlugin object at 0xb68ba6f0>, <whoosh.qparser.plugins.SingleQuotePlugin object at 0xb68aa9f0>, <whoosh.qparser.plugins.FieldnameTagger object at 0xb68ba750>, <whoosh.qparser.plugins.WildcardPlugin object at 0xb68aaeb0>, <whoosh.qparser.plugins.PhraseTagger object at 0xb68ba790>, <FnTagger <_sre.SRE_Pattern object at 0xb68dfce0> (openB)>, <FnTagger <_sre.SRE_Pattern object at 0xb68dfd40> (closeB)>, <OpTagger '(^|(?<=(\\s|[()])))NOT(?=\\s)' (not)>, <OpTagger '(?<=\\s)AND(?=\\s)' (and)>, <OpTagger '(?<=\\s)OR(?=\\s)' (or)>, <OpTagger '(^|(?<=\\s))REQUIRE(?=\\s)' (req)>, <whoosh.qparser.plugins.BoostPlugin object at 0xb68b0cb0>, <whoosh.qparser.plugins.RangeTagger object at 0xb68ba7f0>, <whoosh.qparser.plugins.WhitespacePlugin object at 0xb68ba6d0>, <whoosh.qparser.plugins.WhitespacePlugin object at 0xb68aaa10>]
Tagger: <whoosh.qparser.plugins.FieldnameTagger object at 0xb68ba750> at 0: <u'num':>
Tagged group: <AndGroup <u'num':>, <None:u'1'>>
Pre-filtered group: <AndGroup <u'num':>, <None:u'1'>>
..Applying: <bound method GroupPlugin.do_groups of <whoosh.qparser.plugins.GroupPlugin object at 0xb68b03b0>>
..Result: <AndGroup <u'num':>, <None:u'1'>>
..Applying: <bound method BoostPlugin.clean_boost of <whoosh.qparser.plugins.BoostPlugin object at 0xb68b0cb0>>
..Result: <AndGroup <u'num':>, <None:u'1'>>
..Applying: <bound method WildcardPlugin.do_wildcards of <whoosh.qparser.plugins.WildcardPlugin object at 0xb68aaeb0>>
..Result: <AndGroup <u'num':>, <None:u'1'>>
..Applying: <bound method FieldsPlugin.do_fieldnames of <whoosh.qparser.plugins.FieldsPlugin object at 0xb68aaed0>>
..Result: <AndGroup <u'num':u'1'>>
..Applying: <bound method WhitespacePlugin.remove_whitespace of <whoosh.qparser.plugins.WhitespacePlugin object at 0xb68ba6d0>>
..Result: <AndGroup <u'num':u'1'>>
..Applying: <bound method WhitespacePlugin.remove_whitespace of <whoosh.qparser.plugins.WhitespacePlugin object at 0xb68aaa10>>
..Result: <AndGroup <u'num':u'1'>>
..Applying: <bound method OperatorsPlugin.do_operators of <whoosh.qparser.plugins.OperatorsPlugin object at 0xb68b0590>>
..Result: <AndGroup <u'num':u'1'>>
..Applying: <bound method BoostPlugin.do_boost of <whoosh.qparser.plugins.BoostPlugin object at 0xb68b0cb0>>
..Result: <AndGroup <u'num':u'1'>>
Syntax tree: <AndGroup <u'num':u'1'>>
Pre-normalized query: And([Term(u'num', '\x00\x80\x00\x00\x01')])
Normalized query: Term(u'num', '\x00\x80\x00\x00\x01')
Raw: num:1
Parsed: num:'\x00\x80\x00\x00\x01'

It looks like the problem occurs in whoosh/qparser/ when we are calling

q = nodes.query(self).

If you look a the output from the above this seems to be where the query changes.

Syntax tree: <AndGroup <u'num':u'1'>>
Pre-normalized query: And([Term(u'num', '\x00\x80\x00\x00\x01')])

Comments (4)

  1. Thomas O'Donnell reporter

    After further investigation it looks like the number is converted to bytes in NUMERIC.parse_query() function this is then never converted back when converting the term to unicode.

  2. Matt Chaput repo owner

    I don't see what the problem is. Just that the term is printed as bytes instead of a number? That might be nice but I don't know if it's worth worrying about.

  3. Log in to comment