Issue #231 resolved

SingleQuotePlugin lets phrases with space be split

argonaut
created an issue

A QueryParser with the default configuration, will parse the query

'gene chip' (single quotes included, on a TEXT field with stemming analyzer)

as (u"'gene cgip'", And([Term('content', u'gene'), Term('content', u'cgip')]))

that is it let the single-quoted term with spaces be split further.

If the SingleQuotePlugin is replaced with a customized PhrasePlugin which uses single-quotes to delimit a phrase, as in self._qparser.remove_plugin_class(SingleQuotePlugin) self._qparser.add_plugin(PhrasePlugin("'(?P<text>.*?)'"))

then it parses to (u"'gene cgip'", Phrase('content', [u'gene', u'cgip'], slop=1, boost=1.000000))

Unless I miss something, I would reccomend to ammend the default configuration.

Comments (2)

  1. Matt Chaput repo owner

    Oops, sorry, I misunderstood. I don't know if it's actually broken. If I do this it works as intended:

    qp = qparser.QueryParser("f", schema)
    q = qp.parse("'foo bar'")
    # q == (text:foo OR text:bar)
    

    This is the expected behavior. The purpose of single quotes is to protect pieces of text that contain spaces from the query parser's method of breaking the query up by splitting on whitespace. Instead of the query splitting the text between single quotes on whitespace, it gets passed to the field's analyzer as one piece. This is useful e.g. if you have a field with an analyzer that ignores spaces (e.g. an ID field), and you want to search for a term containing spaces.

    For a field with a StemmingAnalyzer, the text between the single quotes foo bar will get passed to the analyzer, and then analyzer will split it by whitespace anyway (since that's the default behavior of StemmingAnalyzer).

    Hopefully that makes sense, if it doesn't I can give some examples :) I should probably try to make the docs clearer on this.

  2. Log in to comment