Source

whoosh / tests / test_spelling.py

Author Commit Message Labels Comments Date
Matt Chaput
Fixed str/byte issue on Python 3.x.
Matt Chaput
Remove assert for spell words not in index since they can be separate spelling words. Fixes issue #296. Use Field.from_bytes() to decode separate spelling words instead of utf8decode.
Matt Chaput
Converted tests to use py.test instead of nose.
Comments 1
Thomas Waldmann
white-space-only source cosmetics, details see below Changes were made by running a script that did the cleanups automatically: - no trailing blanks - exactly one linefeed at file end, see PEP8 - DOS line endings on .bat and .cmd files, unix line endings everywhere else
Matt Chaput
Added test for spell-checking unicode strings.
Matt Chaput
Fixed tests on Python 3.2. Added py2.5 and py3.2 to tox.ini.
Matt Chaput
Moved code from fileindex, filewriting, and filewriting into index, writing, and reading. The original idea of filedb was that it could be one of many possible backends. However, it's unlikely that Whoosh will ever support a non-file-based backend. From now on, the filedb package will just be for disk I/O related code.
Matt Chaput
Reorganized modules. Changed util module to a package. Moved several modules from support package to util package to return support to a place for 3rd party code. Moved and renamed several modules to the top level package.
Matt Chaput
dawg.within() didn't pass an address to find_path(), if there was no default root, it would always abort. Fixes issue #261. Fixed unbalanced start/finish_field calls to GraphWriter.
Branches
2.4x
Matt Chaput
Moved several functions included for backwards compatibility to compat from util. Fixed bug in StructFile.get_array(). Fixed references to filedb.fileindex.Segment. Added "gint" variable-length integer compression functions to util.
Matt Chaput
Broke query and matching modules into sub-modules because the files were huge. Went over remaining test files for minor PEP8 issues.
Matt Chaput
Renamed "standard" codec to "W2" for better future compatibility.
Matt Chaput
Fixed unicode handling in FSA/FST more. Fixed lots of bytes/unicode and other Python 3 compatibility issues. Unit tests pass on Python 3.
Matt Chaput
Fixed FST code to work with unicode strings and Py3 bytes objects.
Matt Chaput
Reimplemented word graph code to be faster, use less memory. Added (untested) FST code. New GraphWriter/GraphReader keeps track of multiple roots instead of using the field name as the first key. Removed ability to keep a word graph in memory only. Moved low-level DAWG tests to new unit test module. Fixed bugs in iter_items/iter_postings.
Matt Chaput
Big refactoring to make filedb use a pluggable codec for writing and reading to disk. This is still unstable. Multiprocessing isn't done and it might not work with old indices.
Matt Chaput
Added object identity comparison to BuildNode.__eq__().
Matt Chaput
Fixed unit test to work with spelling correction bugfix.
Matt Chaput
Fixed sorting error. Fixed bug where original word could appear in suggestions. Updated docstring to explicitly state the original word will not be in the suggestions. Fixes #174.
Thomas Waldmann
transformed all *.py files to lf lineends, remove trailing blanks, normalize EOF
Matt Chaput
Added benchmark/dcvgr10.txt.gz to manifest so it will be included in source distribution. Fixes issue #161.
Matt Chaput
Removed debug prints.
Matt Chaput
Added ability to generate "spelling" words separately from indexing for certain fields. This allows fields to be indexed with stemming but store unstemmed words in the word graph.
Matt Chaput
Moved analyzer up to field (it was always a bad decision to put it on the format, finally fixed it). Added ability to run "unmorphed" version of analyzer chain. Hardcoded unicode numbers, uppercase, and lowercase instead of computing them whenever analysis is imported. General cleanup.
Matt Chaput
Removed error and warning flags.
Matt Chaput
Merging spelling branch into mainline.
Matt Chaput
Finished first iteration of new spelling system (finally!!!).
Branches
dawg
Matt Chaput
More work on supporting spelling correction.
Branches
dawg
Matt Chaput
Cleanups and additions to query inspection. Added debugging back to query parser. Fixed lack of startchar/endchar on some syntax nodes.
Branches
dawg
Matt Chaput
Added code to set syntax attribute on query objects.
Branches
dawg
  1. Prev
  2. 1
  3. 2
  4. Next