add an "opt in" flag to query.update() regarding when updating on polymorphic FROM?

Issue #3252 resolved
Abhinav Gupta created an issue

Consider the following setup,

class User(Base):
    __tablename__ = 'users'

    user_id = Column(Integer, primary_key=True)
    name = Column(String)
    role = Column('role', Enum('user', 'super'))

    __mapper_args__ = {'polymorphic_on': role, 'polymorphic_identity': 'user'}


class SuperUser(User):
    __tablename__ = 'superusers'
    __mapper_args__ = {'polymorphic_identity': "super"}

    super_id = Column(Integer, primary_key=True)
    users_user_id = Column(Integer, ForeignKey('users.user_id'))

    email = Column('email', String)

If you try to perform an update on a query using the base class's primary key like so,

session.query(SuperUser).filter_by(user_id=some_id).update(
    {'email': 'foo@example.com'}
)

The system will generate the following UPDATE command.

UPDATE superusers SET email=? FROM users WHERE users.user_id = ?

This is trying to join superusers with users without any join criteria. That's obviously wrong.

To make it work, you have to either use the foreign key column name or fetch the original object and update on that.

session.query(SuperUser).filter_by(users_user_id=some_id).update(
    {'email': 'foo@example.com'}
)
session.commit()

# or

su = session.query(SuperUser).get(some_id)
su.email = 'foo@example.com'
session.add(su)
session.commit()

This generates the correct command:

UPDATE superusers SET email=? WHERE superusers.super_id = ?

I have tested it with SQLAlchemy 0.8.2 against Postgres and SQLite, and SQLAlchemy 0.9.8 against SQLite.

In case of SQLite, the command immediately fails because it doesn't support JOINs in UPDATEs. Here's the full trace:

Traceback (most recent call last):
  File "bug.py", line 53, in <module>
    {'email': 'foo@example.com'}
  File "$HOME/dev/py/sqlalchemy-bug-repro/env/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2796, in update
    update_op.exec_()
  File "$HOME/dev/py/sqlalchemy-bug-repro/env/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 897, in exec_
    self._do_exec()
  File "$HOME/dev/py/sqlalchemy-bug-repro/env/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 995, in _do_exec
    update_stmt, params=self.query._params)
  File "$HOME/dev/py/sqlalchemy-bug-repro/env/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 991, in execute
    bind, close_with_result=True).execute(clause, params or {})
  File "$HOME/dev/py/sqlalchemy-bug-repro/env/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 729, in execute
    return meth(self, multiparams, params)
  File "$HOME/dev/py/sqlalchemy-bug-repro/env/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 322, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "$HOME/dev/py/sqlalchemy-bug-repro/env/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 826, in _execute_clauseelement
    compiled_sql, distilled_params
  File "$HOME/dev/py/sqlalchemy-bug-repro/env/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 958, in _execute_context
    context)
  File "$HOME/dev/py/sqlalchemy-bug-repro/env/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1159, in _handle_dbapi_exception
    exc_info
  File "$HOME/dev/py/sqlalchemy-bug-repro/env/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb)
  File "$HOME/dev/py/sqlalchemy-bug-repro/env/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 951, in _execute_context
    context)
  File "$HOME/dev/py/sqlalchemy-bug-repro/env/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 436, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (OperationalError) near "FROM": syntax error u'UPDATE superusers SET email=? FROM users WHERE users.user_id = ?' ('foo@example.com', 2)

In case of Postgres, it ends up joining the tables and possibly updating everything (I haven't had a chance to confirm the last part).

Sample code to reproduce this attached to the report.

Comments (14)

  1. Mike Bayer repo owner

    this behavior is expected and is documented, it is one of the key caveats listed in the documentation for query.update(), at http://docs.sqlalchemy.org/en/rel_0_9/orm/query.html?highlight=query.update#sqlalchemy.orm.query.Query.update.

    The Query.update() method provides an entryway to more direct SQL, e.g. the UPDATE statement, and in that regard it is an optimization compared to the regular unit of work use (that is, session.add() + commit()).. However, in standard SQL, an UPDATE statement only involves one table; the various multi-table forms everyone takes for granted now are vendor specific extensions and not part of SQL; therefore at a basic level, applying an UPDATE to a row that spans multiple tables doesn't make much sense. So you have to specify this how you want it to be.

  2. Abhinav Gupta reporter

    On backends that do support joins in updates, the generated statement can do some serious damage because no join criteria is specified. Would it be reasonable to throw an exception instead of executing the statement?

    I just confirmed against Postgres that the generated statement UPDATE superusers SET email=? FROM users WHERE users.user_id = ? will indeed change the email address for all rows in the superusers table.

    $ SELECT * FROM users;
    
     user_id |    name     | role
    ---------+-------------+-------
           1 | Bruce Wayne | user
           2 | Clark Kent  | super
    (2 rows)
    
    $ SELECT * FROM superusers;
    
     super_id | users_user_id |       email
    ----------+---------------+--------------------
            1 |             2 | superman@yahoo.com
            2 |             1 | bruce@wayne.com
    (2 rows)
    
    $ SELECT name, role, email FROM users JOIN superusers ON superusers.users_user_id = users.user_id;
    
        name     | role  |       email
    -------------+-------+--------------------
     Clark Kent  | super | superman@yahoo.com
     Bruce Wayne | user  | bruce@wayne.com
    (2 rows)
    
    $ UPDATE superusers SET email = 'totally.not.superman@gmail.com' FROM users WHERE users.user_id = 2;
    
    $ SELECT name, role, email FROM users JOIN superusers ON superusers.users_user_id = users.user_id;
        name     | role  |             email
    -------------+-------+--------------------------------
     Clark Kent  | super | totally.not.superman@gmail.com
     Bruce Wayne | user  | totally.not.superman@gmail.com
    (2 rows)
    
  3. Mike Bayer repo owner

    it's not really feasible. You could just as easily be emitting an UPDATE that has a WHERE clause that joins the tables correctly (in fact this is more common). It would be complicated and error prone to search within the whole statement and try to guess if semantically it does a particular thing SQL-wise, all just so that a warning may be emitted. Consider that this really means SQLAlchemy has to anticipate the behavior of a SQL statement on the Python side without sending it to the database. That means we're beginning to build pieces of the database itself on our end.

    the alternative is that we simply throw an exception when any polymorphic class is passed to update(), but that is a heavy handed approach to an operation that merely has caveats. This is how it was before, and then of course some folks actually wanted to be able to use query.update() in the way that it is documented now.

    The question is, how exactly did you come across knowing about query.update() without seeing the caveats listed in the documentation? This particular issue is even written in boldface. SQLAlchemy of course likes to warn for detectable issues where the user might be going wrong but there are limits to that. SQL queries should always be tested.

  4. Abhinav Gupta reporter

    Fair enough. I wasn't sure how technically feasible this would be. I agree that disallowing polymorphic classes completely is probably a bad idea.

    I came across the method in a response to someone's StackOverflow question. Its usage seemed straightforward enough that I did not think to check for caveats.

    I'll switch to using the workaround. Thanks.

  5. Mike Bayer repo owner

    I'd rather not box update() in to this limitation by making it part of the API. I've improved the documentation and added #3271 for the potential feature of trying to scan the UPDATE tables to determine if the criteria can be added.

  6. Log in to comment