Issue #296 resolved

Query correction fails when TEXT(analyzer=StemmingAnalyzer)

argonaut avatarargonaut created an issue

in the default branch (2.5.0) query correction fails when the StemmingAnalyzer is used, with spelling=True. In spelling.py line 118 the assert f, "Suggestion %s:%r not in index" % (fieldname, sug) causes the corrector/suggest to fail. This is related to issue #218; it seems that StemmingAnalyzer creates a 'legitimate' reason for a term to be missing from the index?

Is there another way to do this?

The code segment below reproduces the problem.

from whoosh.index import create_in
from whoosh.fields import *
from whoosh.analysis import  StemmingAnalyzer
_stem_ana = StemmingAnalyzer()
schema = Schema(content=TEXT(analyzer= _stem_ana, spelling= True), organism=ID())
ix = create_in("./data", schema, "test_whoosh_db")
writer = ix.writer()
writer.add_document(organism=u"hs", content=u"cells")
writer.add_document(organism=u"hs", content=u"cell")
writer.commit()
mistyped_words = ["cell"]
with ix.searcher() as s:
    corrector = s.corrector("content")
    for mistyped_word in mistyped_words:
        print corrector.suggest(mistyped_word)

Comments (4)

  1. Log in to comment
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.