Query correction fails when TEXT(analyzer=StemmingAnalyzer)

Issue #296 resolved
created an issue

in the default branch (2.5.0) query correction fails when the StemmingAnalyzer is used, with spelling=True. In spelling.py line 118 the assert f, "Suggestion %s:%r not in index" % (fieldname, sug) causes the corrector/suggest to fail. This is related to issue #218; it seems that StemmingAnalyzer creates a 'legitimate' reason for a term to be missing from the index?

Is there another way to do this?

The code segment below reproduces the problem.

from whoosh.index import create_in
from whoosh.fields import *
from whoosh.analysis import  StemmingAnalyzer
_stem_ana = StemmingAnalyzer()
schema = Schema(content=TEXT(analyzer= _stem_ana, spelling= True), organism=ID())
ix = create_in("./data", schema, "test_whoosh_db")
writer = ix.writer()
writer.add_document(organism=u"hs", content=u"cells")
writer.add_document(organism=u"hs", content=u"cell")
mistyped_words = ["cell"]
with ix.searcher() as s:
    corrector = s.corrector("content")
    for mistyped_word in mistyped_words:
        print corrector.suggest(mistyped_word)

Comments (4)

  1. Log in to comment