ngram analyzer ignores "minsize" during queries

Issue #471 new
Anonymous created an issue

ngram analyzer only uses the "maxsize" character ngrams during queries. This prevents one of the more common uses of ngrams: spelling robustness.

Example:
Here only 6-grams are used, even though the analyzer should detect matches with 3-gram, 4-grams, and 5-grams

text_analyzer = NgramTokenizer(3,6)
schema = Schema(content=TEXT(stored=True, analyzer=text_analyzer, multitoken_query="or"))
query = QueryParser("content", schema).parse("apples")
print("query")

This prints content:apples, which will not match a document containing appled even though it shares many of the ngrams with sizes 3 and 4. The following ngrams are missing from the query:

'app',  'appl, 'apple', 'ppl', 'pple', 'ple'

Comments (0)

  1. Log in to comment