Issues

Issue #3160 new

postgresql - to_tsquery docs and implementation detail

jvanasco
created an issue

I recently realized some implementation details of to_tsquery that are incompatible with the docs I drafted and how sqlalchemy integrates it.

Not sure how to handle this.

I based the docs/examples on existing tests and text. So we have this as the first bit of "Full Text Search" ( http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html#full-text-search )

select([sometable.c.text.match("search string")])
SELECT text @@ to_tsquery('search string') FROM table

well, if we put this into psql...

badsql=> select to_tsquery('search string') ;
ERROR:  syntax error in tsquery: "search string"

that's because tsquery has a rigid enforcement of input. text must either be tokenized and joined with acceptable operators :

select to_tsquery('cat & rat');

be quoted :

select to_tsquery('''search string''');

or use the alternate function

select plainto_tsquery('search string');

So, these are acceptable:

select to_tsquery('search & string');
select to_tsquery('''search string''');
select plainto_tsquery('search string');

but this is not:

select to_tsquery('search string');

I'm not sure the best way to handle this nuance in the docs.

As far as the implementation detail goes...

this creates issues with Column.match, because that generates (invalid) sql like this:

table.column @@ to_tsquery('search string')

not valid sql like:

table.column @@ to_tsquery('''search string''')
table.column @@ plainto_tsquery('search string')

This is an annoying implementation detail , but the content/type of string will affect the function or format t search on.

i thought about just regexing this into submission, but the multitude of possible operators and edge cases suggests that the text would need to be parsed for tokenization instead -- otherwise throwing in a bit of text that has an operator will function as a real operator and probably trigger an invalid syntax.

i think the easiest scenario would be to replace match's to_tsquery with plainto_tsquery

Comments (9)

  1. jvanasco reporter

    Likely a 1.0

    Wanted to bring this up for discussion; reach out to others who have been working on full text support.

    Still wrapping my head around this, and what the proper flow should be. Perhaps no code change, but a lot of docs... But I'm not sure

    There are times where "to_tsquery" would be appropriate, and others where it's not. I think the bulk of usage scenarios in the orm would fall under a "plainto" match though.

    Personally, I've replaced all my 'col1.match(col2)' with "col1.op("@@")(func.plainto_tsquery(col2))"

  2. jvanasco reporter

    I have to sit this release out. Personal commitments through mid sept. I'll be more active for the 1.1 release. And will be able to tackle this and the other docs I promised then.

    Really sorry about this. Something came up. I'll go into details later

  3. Log in to comment