highlight display stematized (example : ‘’articl’’ instead ‘’articles’')

Issue #427 closed
Olivier Kautz
created an issue

Hello,

First of all, thans for this great tool.

I use Whoosh for french language and I got a stange behaviour in highlighted results. My analyser is configures like this

french_analyser = LanguageAnalyzer("fr")
custom_french_analyzer = french_analyser | CharsetFilter(accent_map) | NgramFilter(minsize=2, maxsize=9)

I index my content in a text field like this

text=TEXT(stored=True, analyzer=custom_french_analyzer)

The query now with the term «article»

qp = QueryParser("text", ix.schema)
q = qp.parse('article')

And I got this kind of result in the highlighted extract

prévue à l’<b class="match term0">articl</b>
ccorder aux <b class="match term0">articl</b>

The match term is the result of the stemmer. Here «articl» instead of «article» or «articles» in plural form.

Is it possible to get the original sting of the sentence in highlighted extracts ?

Thank you for your help

Comments (3)

  1. Matt Chaput repo owner

    Hi, very sorry it took so long to reply. The problem isn't stemming, it's that you're indexing and searching for N-grams. Since the query is only matching small groups of characters, that's what will be highlighted.

    One solution might be to index the same content in two fields, one with "whole words" and one with n-grams, and highlight using the "whole-word" field.

  2. Log in to comment