1. Matt Chaput
  2. whoosh
  3. Issues
Issue #250 resolved

Term text as bytes

Matt Chaput
repo owner created an issue

Currently term text is treated as/asserted to be unicode everywhere and only converted written to disk as bytes at the lowest level. This was intended to enforce proper "text hygiene" in the confusing Python 2.x world.

Rewrite the code to convert to bytes earlier (above the posting pool level) to allow at least "self-parsing" fields (e.g. NUMERIC) to store terms compactly instead of having to convert to text as they do now.