query.join() is silently ignored within query.update(), query.delete()

Issue #3349 resolved
Other User created an issue

I noticed that _no_select_modifiers and _no_criterion_assertion in orm/query.py differ only in checking self._statement and self._from_obj. It seems that _no_criterion_assertion should be equivalent to self._statement is None and _no_select_modifiers; it is surprising that _from_obj is omitted, as this can result in a very dangerous situation when doing bulk updates, since _from_obj is ignored entirely.

For example, someone may mistakenly use a join instead of a filter:

(session.query(File)
 .join(File.folder)
 .filter(Folder.owner_id == 1)
 .update({'content': 'USER 1'}, synchronize_session=False))

results in UPDATE file SET content='USER 1' FROM folder WHERE folder.owner_id = 1 which will touch every file instead of just ones belong to user 1.

The suggested fix is to add: ('_from_obj', 'join()', ()) in _no_select_modifiers, which will error, as demonstrated in attached file.

Comments (13)

  1. Mike Bayer repo owner

    i doubt anything would be done at the level of Query. JOIN is valid for some forms of UPDATE such as on MySQL. The fact that update() misinterprets the join without raising a compiler error is more the issue. this would be a core issue.

  2. Other User reporter

    Hm, the only call to _no_select_modifiers is in the __init__ of BulkUpdate and BulkDelete, whose _do_exec method ignores _from_obj. Why is this not the appropriate place to do the check?

    Also, according to https://dev.mysql.com/doc/refman/5.0/en/update.html LIMIT and ORDER BY are also valid for UPDATE's in MySQL, so I don't see why a JOIN should be handled differently.

    What severity does something need to be an error? It seems too dangerous to be a warning. I ran into issues in production code where we accidentally updated every single row in a table because of this.

  3. Mike Bayer repo owner

    Hm, the only call to _no_select_modifiers is in the init of BulkUpdate and BulkDelete, whose _do_exec method ignores _from_obj. Why is this not the appropriate place to do the check?

    because the FROM clause isn't what's considered there as a "select modifier". Truth be told, it seems that method isn't used anywhere else and I'd rather just whack it, and localize the checks to the BulkUD classes.

    Also, according to https://dev.mysql.com/doc/refman/5.0/en/update.html LIMIT and ORDER BY are also valid for UPDATE's in MySQL, so I don't see why a JOIN should be handled differently.

    If this were release 0.1 of the query.update() and query.delete() methods, then yes. But it's not, they've been out for a few years now. So the chance that applications are silently relying on this without issues at the moment is high. That's why we use warnings when we find things like this that have been out in the wild for a long time. We will often push those warnings to exceptions in subsequent major releases but in this case 1.0 is already at beta5 so I'd still be concerned it's a surprise for some.

    What severity does something need to be an error?

    this should have been an error, but it's too late at the moment to just spring it out.

    It seems too dangerous to be a warning

    query.update() and delete() are super-widely used and nobody has reported this issue all throughout the 0.8 or 0.9 series, and there are many many ways to write code that emits SQL that is not what's expected, even though it succeeds without raising. So it's a bug for sure but when I spring new errors on people, it really very often produces lots more headaches for everyone. For backwards compatibility on behaviors that may be silently succeeding for people, we use a warning.

    I ran into issues in production code where we accidentally updated every single row in a table because of this.

    I'm sorry for that, but the truth is there's lots of ways you can write an UPDATE statement that produces an unexpected syntax, not to mention encounter unexpected behaviors with the unit of work, the database, and everything else, and you should be unit and functional testing your code to ensure it works as expected. The echo=True option is extremely prominent in SQLAlchemy as well in order to encourage the developer to be very aware of how the application is acting at the level of SQL interaction. At least when putting code into production, it is reasonable to expect that the developers have observed the warnings that their code produces beforehand and I think if this warning were there, you would have seen it.

  4. Other User reporter

    because the FROM clause isn't what's considered there as a "select modifier". Truth be told, it seems that method isn't used anywhere else and I'd rather just whack it, and localize the checks to the BulkUD classes.

    I see. Would an adequate fix then be to rename and move _no_select_modifiers with something similar to my correction (with a warning) into orm/persistence.py?

    this should have been an error, but it's too late at the moment to just spring it out.

    Ah, that makes sense.

    you should be unit and functional testing your code to ensure it works as expected

    I completely agree... sadly we had forgotten to test that the update touched nothing more than the expected models. Fortunately, the data was entirely recoverable. As a sanity check, is there a nice way to configure SQLAlchemy to log a warning -- maybe even raise an error -- if an update/delete touches more than say, 10000 rows?

  5. Mike Bayer repo owner

    yeah you can for example make an after_cursor_execute event, and then test cursor.rowcount. emit the warning from within the event.

  6. Mike Bayer repo owner
    • :class:.Query doesn't support joins, subselects, or special FROM clauses when using the :meth:.Query.update or :meth:.Query.delete methods; instead of silently ignoring these fields if methods like :meth:.Query.join or :meth:.Query.select_from has been called, a warning is emitted. In 1.0.0b5 this will raise an error. Partial cherry pick from 5302bcebb8e18fdad7448ffb60d2a2017eab26c8. fixes #3349
    • don't realy need _no_select_modifiers anymore

    → <<cset 01a1c3256aee>>

  7. Mike Bayer repo owner
    • :class:.Query doesn't support joins, subselects, or special FROM clauses when using the :meth:.Query.update or :meth:.Query.delete methods; instead of silently ignoring these fields if methods like :meth:.Query.join or :meth:.Query.select_from has been called, an error is raised. In 0.9.10 this only emits a warning. fixes #3349
    • don't needlessly call _compile_context() and build up a whole statement that we never need. Construct QueryContext as it's part of the event contract, but don't actually call upon mapper attributes; use more direct systems of determining the update or delete table.
    • don't realy need _no_select_modifiers anymore

    → <<cset fe1922764151>>

  8. Gilles Dartiguelongue

    Well it did expose hidden failure so I guess this is better :-). It is our fault for not following SQLAlchemy releases close enough hence we haven't seen the warning before. I just wanted to hint that maybe the message could more explicitely say that this is not supported in SQL and/or not yet available in drivers that could support it like postgres with WITH.

  9. Mike Bayer repo owner

    oh. well if you want to suggest text / send a PR. SQLAlchemy is pretty heavy on assuming SQL knowledge of the user.

  10. Log in to comment