postgresql - to_tsquery docs and implementation detail

Issue #3160 new
jvanasco created an issue

I recently realized some implementation details of to_tsquery that are incompatible with the docs I drafted and how sqlalchemy integrates it.

Not sure how to handle this.

I based the docs/examples on existing tests and text. So we have this as the first bit of "Full Text Search" ( http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html#full-text-search )

select([sometable.c.text.match("search string")])
SELECT text @@ to_tsquery('search string') FROM table

well, if we put this into psql...

badsql=> select to_tsquery('search string') ;
ERROR:  syntax error in tsquery: "search string"

that's because tsquery has a rigid enforcement of input. text must either be tokenized and joined with acceptable operators :

select to_tsquery('cat & rat');

be quoted :

select to_tsquery('''search string''');

or use the alternate function

select plainto_tsquery('search string');

So, these are acceptable:

select to_tsquery('search & string');
select to_tsquery('''search string''');
select plainto_tsquery('search string');

but this is not:

select to_tsquery('search string');

I'm not sure the best way to handle this nuance in the docs.

As far as the implementation detail goes...

this creates issues with Column.match, because that generates (invalid) sql like this:

table.column @@ to_tsquery('search string')

not valid sql like:

table.column @@ to_tsquery('''search string''')
table.column @@ plainto_tsquery('search string')

This is an annoying implementation detail , but the content/type of string will affect the function or format t search on.

i thought about just regexing this into submission, but the multitude of possible operators and edge cases suggests that the text would need to be parsed for tokenization instead -- otherwise throwing in a bit of text that has an operator will function as a real operator and probably trigger an invalid syntax.

i think the easiest scenario would be to replace match's to_tsquery with plainto_tsquery

Comments (22)

  1. jvanasco reporter

    Likely a 1.0

    Wanted to bring this up for discussion; reach out to others who have been working on full text support.

    Still wrapping my head around this, and what the proper flow should be. Perhaps no code change, but a lot of docs... But I'm not sure

    There are times where "to_tsquery" would be appropriate, and others where it's not. I think the bulk of usage scenarios in the orm would fall under a "plainto" match though.

    Personally, I've replaced all my 'col1.match(col2)' with "col1.op("@@")(func.plainto_tsquery(col2))"

  2. jvanasco reporter

    I have to sit this release out. Personal commitments through mid sept. I'll be more active for the 1.1 release. And will be able to tackle this and the other docs I promised then.

    Really sorry about this. Something came up. I'll go into details later

  3. jvanasco reporter

    I had a moment to reflect back on this and try to figure it out.

    I've been looking at this from the perspective of Postgres, not SqlAlchemy.

    @zzzeek - is there any instance where someone would put "not plaintext" in a .match() clause with other backends? I can't imagine any reason why that would happen, which suggests the fix would be to use 'plainto_tsquery'

    just to clarify this...

    to_tsquery (without quoted strings) implies/necessitates that the sample text for the match is in a specific postgres format. plainto_tsquery (or to_tsquery with quoted strings) implies that the sample text is plaintext.

    if match consistently expects plaintext for the comparison, then the backwards incompatible change for operators needs to be made so that the same code would work independent of backends.

    if match is inconsistent, then this could just be a docs change to point people in the right direction.

    i think 9/10 users will initially want the plainto feature, but that's not a reason to make a breaking change -- but i think ensuring consistency across platforms is.

  4. Mike Bayer repo owner

    @jvanasco - OK...well I never use match or tsquery stuff, so, I'd need to get back into this and figure out what might be typical. Considering how many screwups I'm making in the 1.0 release, I definitely need weeks to clear that off before thinking about things like this :). Maybe get some other feedback from people who use this feature.

  5. jvanasco reporter

    I was looking for your package a few weeks ago!

    Do you have any input on what the default core support should be?

  6. David Wortham

    +1 for support for Postgres "plainto_tsquery" (whether done with ".match" or another method).

    I'm currently investigating the sqlalchemy-searchable module to see if it will suit my needs.

  7. jvanasco reporter

    sqlalchemy already supports this via the func proxy. This ticket is just for the best documentation and default methods of match operator.

    to do a tsquery:

    _search_vector = sqlalchemy.func.plainto_tsquery(sqlalchemy.text("'english'"), search_string)
    query = query.filter(
        sqlalchemy.func.to_tsvector(sqlalchemy.text("'english'"), model.core.Table.field).op('@@')(_search_vector),
    )
    
  8. Mike Bayer repo owner

    I dont really have plans to look at this unless someone can really ELI5 for me, this is low on my TODO.

  9. JOSEPH FUTRELLE

    ELI5: psql to_tsquery fails with syntax error on typical freeform text queries (e.g., something a user typed into a form), and plainto_tsquery is provided for that case. Match in sqla uses to_tsquery and so there's no obvious way in sqla to perform a full-text search on a freeform text query.

    To add to jvanasco's workaround above, here's an ORM-based workaround, that should probably at least be documented:

    search_string = 'some freeform search string'
    tq = sqlalchemy.func.plainto_tsquery('english', search_string)
    q = session.query(MyClass).filter(MyClass.some_column.op('@@')(tq))
    

    jvanasco, should I be passing 'english' to the 'text' function like you are? This works for me without doing that.

  10. Mike Bayer repo owner

    this would all start out as a section in the postgresql docs as "workarounds for ts_query / match / etc" and would be oriented towards "the current implementation of match has some shortcomings, here's workarounds to use PG's features more directly"

  11. jvanasco reporter

    @joefutrelle On my 9.1/2/3 installs, It depends on how the index is created. If you created the index with 'english', (which was in some pg docs) then the planner will only use the index if you pass in the same language. If you created the index without a language, it seems to work if the default is the same or if you pass in the name of the default language.

    The most reliable performance is creating an explicit index and passing in the language, however it's not much work at all to ensure you have the right defaults in place.

  12. JOSEPH FUTRELLE

    @jvanasco thanks, I mean to ask is there some reason I ought to pass the language string through sqlalchemy.text?

  13. Log in to comment