StaleDataError in sqlalchemy 1.0.6

Issue #3476 resolved
Patrick Hayes created an issue

When calling bulk_update_mappings in version 1.0.6, occasionally StaleDataErrors emerge.

Previously I was using the fork at https://github.com/pfhayes/sqlalchemy (from this GitHub PR: https://github.com/zzzeek/sqlalchemy/pull/181) and no StaleDataErrors were ever observed.

I tried updating to 1.0.6 and StaleDataErrors started to occur frequently. They happen intermittently when calling code like this:

Base = declarative_base()
class User(Base):
  id = Column(BigInteger, primary_key=True)
  score = Column(DOUBLE_PRECISION)

session.bulk_update_mappings(User, {
  'id': X, 'score': Y,
  ...
})

The StaleDataError typically looks like

StaleDataError: UPDATE statement on table 'users' expected to update 18 row(s); 17 were matched.

It seems to always expect n+1 when n are matched.

I haven't been able to get a consistent repro. Any suggestions on this?

Comments (13)

  1. Patrick Hayes reporter

    Here is a stack trace:

      File "db/service.py", line 121, in update_all
        ret = self._session.bulk_update_mappings(mapper, mappings)
      File "/home/travis/virtualenv/python2.7_with_system_site_packages/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2318, in bulk_update_mappings
        self._bulk_save_mappings(mapper, mappings, True, False, False, False)
      File "/home/travis/virtualenv/python2.7_with_system_site_packages/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2340, in _bulk_save_mappings
        transaction.rollback(_capture_exception=True)
      File "/home/travis/virtualenv/python2.7_with_system_site_packages/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 60, in __exit__
        compat.reraise(exc_type, exc_value, exc_tb)
      File "/home/travis/virtualenv/python2.7_with_system_site_packages/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 2332, in _bulk_save_mappings
        isstates, update_changed_only)
      File "/home/travis/virtualenv/python2.7_with_system_site_packages/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 121, in _bulk_update
        bookkeeping=False)
      File "/home/travis/virtualenv/python2.7_with_system_site_packages/local/lib/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 710, in _emit_update_statements
        (table.description, len(records), rows))
    StaleDataError: UPDATE statement on table 'users' expected to update 18 row(s); 17 were matched.
    
  2. Mike Bayer repo owner

    OK, well this is from your PR, so simple questions:

    1. does it work if you use the rev where I merged your PR exactly? 09485d733131b667813f44 and then it starts failing where I changed it bcbfcca2360a58ec1b5 ? or does it fail on the PR merge as well?

    2. what does SQL echoing say? Is incorrect SQL being emitted that doesn't actually have the correct parameters in it?

  3. Patrick Hayes reporter
    1. 09485d733131b667813f44 is identical to github.com/pfhayes/master, so that one works. I'm checking bcbfcca2360a58ec1b5 now - it may take some time to get some results but I will report back.

    2. Here is the SQL emitted on a case that says "expected to update 18 rows; 17 were matched". I added some whitespace for clarity. As far as I can tell, the SQL is correct and attempts to update 18 rows.

    UPDATE users SET score=%(score)s WHERE users.id = %(users_id)s
    (
      {'score': 0.0781063638173, 'users_id': 2794L},
      {'score': 0.070034869018, 'users_id': 2795L},
      {'score': 0.0700348733759, 'users_id': 2796L},
      {'score': 0.070031715614, 'users_id': 2797L},
      {'score': 0.0700053052325, 'users_id': 2798L},
      {'score': 0.0699735661323, 'users_id': 2799L},
      {'score': 0.0699016361744, 'users_id': 2800L},
      {'score': 0.0699437756869, 'users_id': 2801L},
      {'score': 0.0694990963771, 'users_id': 2802L},
      {'score': 0.0689085607616, 'users_id': 2803L},
      {'score': 0.0657422471612, 'users_id': 2793L},
      {'score': 0.0579451905242, 'users_id': 2792L},
      {'score': 0.0652963306316, 'users_id': 2790L},
      {'score': 0.0512848994003, 'users_id': 2789L},
      {'score': 0.00719930820623, 'users_id': 2788L},
      {'score': 0.0696689986084, 'users_id': 2786L},
      {'score': 0.059347493426, 'users_id': 2785L},
      {'score': 0.0629442601587, 'users_id': 2784L}
    )
    
  4. Mike Bayer repo owner

    Well i see 18 values there and the primary key values are all unique. So if we are getting 17 back, then this is either that one of those primary keys is not actually in your database, or your database driver is screwing up. I am assuming this is MySQL (as evidenced by the use of longs for integers) and that you have not in any way tampered with the CLIENT_FLAGS.FOUND_ROWS setting that SQLAlchemy's MySQL dialects establish; if this has been modified, then MySQL may not return the correct number if any of those "score" values ultimately matches the value that is already present in the database.

    Certainly, there should be no difference whatsoever between your patch and my modifications to it later, because those only affect the SQL which is correct here.

    So next issues are: 1. are all those primary keys above actually present and 2. any manipulation of the FOUND_ROWS setting either on the driver or the database connection?

  5. Patrick Hayes reporter

    Thanks for the quick response.

    This is postgres, and I haven't modified FOUND_ROWS (or anything similar).

    It's possible that some of those primary keys are not present, though it's surprising to me that this would throw a StaleDataError in that case. The argument to bulk_update_mappings is just a manually composed dictionary - so it sounds like the caller needs to ensure that every entry in the dict corresponds to a valid row?

    If this is expected behaviour, I will just catch the StaleDataError, but perhaps the documentation should be updated (or maybe it was somewhere that I didn't see).

  6. Mike Bayer repo owner

    Well if you really just want to emit UPDATE statements, just use a core update() construct. It will perform much faster than bulk_update_mappings(). The point of bulk_update_mappings() is to do as much of the same things that adding ORM classes to the session via add() would do (including features like version counters, which also do this check), just without the overhead of manipulating object attributes.

    It's a potential feature add to allow pick-and-choose of these behaviors with bulk_update_mappings() but it seems like already perhaps the rationale for the feature is being misunderstood.

  7. Patrick Hayes reporter

    I see - perhaps I misunderstood the use case for bulk_update_mappings.

    I was looking for an API that would let me still leverage the ORM for classes/fields/etc while generating bulk update statements. I tried using bulk_update_mappings in the following form:

    bulk_update_mappings(User, [
      {User.id: X, User.score: Y,},
    ])
    

    but that did not work, and I was forced to write out the keys explicitly (as 'users_id', 'score').

    Very high performance is not a huge concern here - so it sounds like the best thing to do here is just to use the normal .update()?

    However, it states on http://docs.sqlalchemy.org/en/rel_1_0/core/dml.html#sqlalchemy.sql.expression.Update.values that the .update().values() call does not accept multiple values.

    "In the case of an Update construct, only the single dictionary/tuple form is accepted, else an exception is raised."

  8. Mike Bayer repo owner

    However, it states on http://docs.sqlalchemy.org/en/rel_1_0/core/dml.html#sqlalchemy.sql.expression.Update.values that the .update().values() call does not accept multiple values.

    when values() talks about "multiple values", it references the SQL of the INSERT call, which on many backends supports the form "INSERT INTO table (a, b, ..) VALUES (x1, y1, ...) (x2, y2, ..), ... , e.g. multiple value sets in one string.

    The documentation here, most specifically the green "Note" box in that document you linked, is contrasting this form to that of the DBAPI executemany method: https://www.python.org/dev/peps/pep-0249/#id18, which is also provided via SQLAlchemy's execution API . I'll change the phrase "single-row INSERT" to be "single-row INSERT or single-criteria UPDATE statement", hopefully that will be less misleading. This is the "traditional" way of invoking INSERT, UPDATE, or DELETE with multiple value sets which is that of a SQL statement that refers to only one set of parameters, but the SQL statement is invoked many times by the DBAPI using the most performant method possible. SQLAlchemy has always supported this form and it is introduced at http://docs.sqlalchemy.org/en/rel_1_0/core/tutorial.html#executing-multiple-statements, and illustrated in terms of an UPDATE statement in http://docs.sqlalchemy.org/en/rel_1_0/core/tutorial.html#inserts-updates-and-deletes. In order to avoid having to use explicit bindparam() objects for all the values in the SET clause, use the naming convention approach described at http://docs.sqlalchemy.org/en/rel_1_0/core/dml.html#sqlalchemy.sql.expression.update.

    When we use the ORM session, the unit of work tries in as many cases as possible to use this form internally, which is challenging when passed a series of objects or tuples that may be heterogeneous. But if you're using a core update() construct, it's easy, just do send a list of parameter sets to execute() instead of a single one.

  9. Patrick Hayes reporter

    Okay. Thanks for the explanation, and updating the docs.

    This is how I ended up accomplishing this:

    conn = session.connection()
    table = User.__table__
    stmt = (table.update()
      .where(table.c.id == bindparam('users_id'))
      .values(score=bindparam('users_score')))
    conn.execute(stmt, [
      {'users_id': X, 'users_score': Y},
      ...
    ])
    
  10. Log in to comment