Issue #235 resolved

Need easier method to list tokens in a self-parsing field

Priit Laes
created an issue

Searching from NUMERIC fields seems to be broken as search against that field always returns 0 results and when peeking at searcher.lexicon, garbled data is displayed: {{{

!python

print list(searcher.lexicon("stock")) [u'\x00O1oVQ', u'\x00O1oVR', u'\x00O1oVS', u'\x00O1oVT',... }}}

Though, when documents themselves seem to be properly indexed, because when iterating over documents, fields are displayed correctly: {{{

!python

for i in searcher.documents(): print i ... {'stock': 8} {'stock': 9}

}}}

Comments (6)

  1. Matt Chaput repo owner

    The numbers are stored in the index at multiple precisions in a coded form. When you search for a number, it is encoded in the same way so it will match.

    To get the list of actual numbers indexed by the field takes a bit of an obscure trick:

    field = searcher.schema["stock"]
    for encoded_text, numeric_value in stock.sortable_values(searcher.reader(), "stock"):
      print numeric_value
    

    There should probably be a more straightforward way to do this, but unfortunately I think having lexicon() do it would cause problems.

  2. Priit Laes reporter

    A bit too obscure ;)

    Thanks, with your hint I figured out how to actually convert and query the number field:

    field = searcher.schema["stock"]
    searcher.search(query.Term("stock", field.to_text(8)))
    
  3. Log in to comment