1. Matt Chaput
  2. whoosh
  3. Issues
Issue #397 resolved

Exact phrase search

Anonymous created an issue
I have a field of type TEXT in my schema. It has a stemming analyser. Everything works well when I run searches with a MultiFieldParser as long as I don't use phrases in my query. 

I was expecting the phrase search to be an exact search. So for instance if I search for "a record*" I would only get document with my field containing exactly "a record*". However the same parsing is applied to the query and it return all documents that contains "record", "records". the same applies if I use single quotes.

Is there a way to force an exact search on my field which, apart from being case-insensitive, will find only exact matches without transforming my phrase?

Note that I've considered using a secondary field of type ID in my schema called FIELD_exact. And then convert the phrase in the query to FIELD_exact:*"PHRASE"*. But doing something like that with a MultiFieldParser is not trivial as I can have logical operators and other non-obvious things to analyse.

Comments (1)

  1. Matt Chaput repo owner

    Hi, very sorry to take so long to respond to this. Putting text inside single quotes will search for that exact text, but in this case the problem is that the tokenizer is not considering the asterisk as part of the word in the first place, so it only indexes "record", not "record*". You can fix this by changing the regular expression it uses for words, for example:

    ana = analysis.StemmingAnalyzer(expression=r"\S+")
    schema = fields.Schema(text=fields.TEXT(analyzer=ana))

    However, then you have "record*" in the index and searches for "record" will not match. If you want both, then you might need a custom analyzer that indexes both versions.

  2. Log in to comment