detect and raise for version_id is NULL both before UPDATE and after UPDATE/INSERT

Issue #3673 resolved
Diggory Blake created an issue

Steps to reproduce:

  • Create a model with a nullable column

  • Set version_id_col = <name> and version_id_generator = False

  • Create an instance of the model with version_id=<non-null value>

  • Set version_id = None and commit the change

  • Set version_id = <non-null value> and try to commit the change

This sequence of operations causes the following error:

StatementError: (sqlalchemy.exc.InvalidRequestError) A value is required for bind parameter 'process_instance_exclusive_lock' [SQL: u'UPDATE process_instance SET exclusive_lock=%s WHERE process_instance.id = %s AND process_instance.exclusive_lock = %s'] [parameters: [{'exclusive_lock': datetime.datetime(2016, 3, 8, 16, 5, 46), 'process_instance_id': UUID('5f886c4f-e22d-11e5-9ded-b4ae2bd63e01')}]]

As you can see, the previous NULL value does not make it into the formatting arguments, which causes it to fail. Using null() instead of None doesn't seem to make any difference.

Comments (22)

  1. Diggory Blake reporter

    Here's a full test case:

    """
    version_id nullable bug
    """
    
    from sqlalchemy import *
    from sqlalchemy import types
    from sqlalchemy.orm import *
    from sqlalchemy.ext.declarative import declarative_base,  declared_attr
    from sqlalchemy import event
    
    from datetime import datetime
    
    Base = declarative_base()
    
    class Foo(Base):
        __tablename__ = 'foo'
        id = Column(types.Integer, primary_key=True)
        bar = Column(types.DateTime())
    
        __mapper_args__ = {
            'version_id_col': bar,
            'version_id_generator': False
        }
    
    
    e = create_engine('sqlite:///:memory:', echo='debug')
    Base.metadata.drop_all(e)
    Base.metadata.create_all(e)
    
    session = Session(e, autocommit=True, autoflush=False, expire_on_commit=False)
    
    # Create a Foo with a NULL version_id column
    x = Foo()
    session.add(x)
    session.flush()
    
    # Try to update x
    x.bar = datetime.now()
    session.flush()
    
  2. Mike Bayer repo owner

    thanks! interestingly, the test case above which is very nice does not seem to follow the steps you initially describe? e.g. it does not create an instance with "non null value" nor do i see any "version_id=None" in there either.

  3. Diggory Blake reporter

    Yes, I found the extra steps were not necessary in practice to reproduce the problem, so I omitted them.

  4. Mike Bayer repo owner

    great. Please confirm this patch resolves the issue. a NULL value in version_id does not make any sense so it should be prohibited:

    diff --git a/lib/sqlalchemy/orm/persistence.py b/lib/sqlalchemy/orm/persistence.py
    index b9a1f9d..0c079af 100644
    --- a/lib/sqlalchemy/orm/persistence.py
    +++ b/lib/sqlalchemy/orm/persistence.py
    @@ -986,6 +986,11 @@ def _finalize_insert_update_commands(base_mapper, uowtransaction, states):
             else:
                 mapper.dispatch.after_update(mapper, connection, state)
    
    +        if mapper.version_id_col is not None:
    +            if state_dict[mapper._version_id_prop.key] is None:
    +                raise orm_exc.FlushError(
    +                    "Instance does not contain a non-NULL version value")
    +
    
     def _postfetch(mapper, uowtransaction, table,
                    state, dict_, result, params, value_params):
    
  5. Diggory Blake reporter

    I disagree that NULL values make no sense: I use optimistic concurrency control to access certain models, but certain rows have no contention because they are not shared. For these I was using NULL values, with the hope that UPDATE statements would add the clause WHERE version_id_field IS NULL.

    It could be argued that this is a feature request rather than a bug; however, more problematic is that this patch seems to only superficially address the problem: regardless of whether version_id fields are allowed to be NULL, it should not be possible for there to be a mismatch in the SQL formatting code between the number of arguments expected and those provided. There is a more serious problem at the root of this, and I'm currently encountering a second bug which is likely caused by a similar issue, and won't be fixed by this patch:

    In order to avoid using NULL, I modified my code to just use a fixed date in the past when I don't care about OCC. However this resulted in a new SQL generation error:

    sqlalchemy.exc.ProgrammingError: (_mysql_exceptions.ProgrammingError) (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'WHERE institution.id = 'RDKa\xe5P\x11\xe5\x84M\xb4\xae+\xd6>\x01' AND institution.exclusive_lock ' at line 1") [SQL: u'UPDATE institution SET WHERE institution.id = %s AND institution.exclusive_lock = %s'] [parameters: ('RDKa\xe5P\x11\xe5\x84M\xb4\xae+\xd6>\x01', datetime.datetime(1970, 1, 1, 0, 0))]

    (Note the erroneous empty SET clause)

    Unfortunately I haven't been able to reproduce this in a small example yet, but I'm still working on that, and I'll post that as soon as I find out how.

  6. Mike Bayer repo owner

    regardless of whether version_id fields are allowed to be NULL, it should not be possible for there to be a mismatch in the SQL formatting code between the number of arguments expected and those provided.

    a NULL version identifier on a persistent object was never a supported use case so if supported, it would be an enhancement (however I disagree thats how your use case should be handled). There's no need for the UPDATE formatting code to deal with a potential NULL value (a complicated change that will add latency to the flush process as well) because NULL in a version id column is not supported. the bug here is that this unsupported case is not handled gracefully.

    The semantics of "versioning" means, "the object A represented by row X is of version Q". In SQL, "NULL" means literally, "unknown". So in this case if you need your timestamp to store NULL then it's not appropriate as a versioning column, given the current assumption that a mapping with a version_id column expects all rows to be subject to versioning. I would suggest the feature you're looking for, drawing from your exact phrase "certain rows have no contention because they are not shared", is that versioning rules be applied on a per-object basis; that is, some objects express that they don't need to be versioned and should be excluded from the process.

    I'd also ask if perhaps these two different classes of objects actually be mapped separately. You can create multiple mappers to the same table and with declarative you can use subclassing to achieve this effect without any repetition in the mapping configuration.

  7. Diggory Blake reporter

    I've been able to reproduce the second issue: it may or may not be related to the first, if not I can open a new issue if you'd prefer:

    """
    version_id polymorphic bug
    """
    
    from sqlalchemy import *
    from sqlalchemy import types
    from sqlalchemy.orm import *
    from sqlalchemy.ext.declarative import declarative_base,  declared_attr
    from sqlalchemy import event
    
    from datetime import datetime
    
    Base = declarative_base()
    
    class Foo(Base):
        __tablename__ = 'foo'
        id = Column(types.Integer, primary_key=True)
        type = Column(types.Unicode(64), nullable=False)
        bar = Column(types.DateTime(), nullable=False)
        other = Column(types.Integer)
    
        __mapper_args__ = {
            'polymorphic_on': u'type',
            'polymorphic_identity': u'Foo',
            'version_id_col': bar,
            'version_id_generator': False
        }
    
    class FooDerived(Foo):
        __tablename__ = 'foo_derived'
        id = Column(types.Integer, ForeignKey(u'foo.id'), primary_key=True)
        other2 = Column(types.Integer)
    
        __mapper_args__ = {
            'polymorphic_identity': u'FooDerived'
        }
    
    
    e = create_engine('sqlite:///:memory:', echo='debug')
    Base.metadata.drop_all(e)
    Base.metadata.create_all(e)
    
    session = Session(e, autocommit=True, autoflush=False, expire_on_commit=False)
    
    x = FooDerived()
    x.bar = datetime(1970, 1, 1)
    session.add(x)
    session.flush()
    
    x.other2 = 1
    session.flush()
    

    The error:

    OperationalError: (sqlite3.OperationalError) near "WHERE": syntax error [SQL: u'UPDATE foo SET  WHERE foo.id = ? AND foo.bar = ?'] [parameters: (1, '1970-01-01 00:00:00.000000')]
    
  8. Mike Bayer repo owner

    Yes that is a variant on the same thing, the system expects there to be a new version identifier for every UPDATE emitted. since you didn't set anything, it's surprised. so yes, another check would be needed for that one too.

  9. Diggory Blake reporter

    If version IDs are to be constrained in the ways you've described (not NULL, and must change on every update) then in addition to better error messages: 1) These constraints should be clearly documented 2) There needs to be a more flexible alternative to add custom WHERE clauses whenever rows are updated: perhaps a callback could be specified on the model which returns a custom query clause.

    Aside from that, do you know why this second error only shows up when inheritance is used?

  10. Mike Bayer repo owner

    the second error is because it knows it needs to emit UPDATE for the base table because there's a version_id_col. The whole "version_generator=False" thing was only added in recent versions to suit some odd use cases, it's not the primary use versioning was developed for. so as we open up the API to allow more styles of use, we also open it up to more styles of mis-use, which then have to have all new error checking schemes.

  11. Mike Bayer repo owner

    the reason I don't like the idea of "allow arbitrary objects to have NULL" for a version id is mostly because it is an extremely odd use case that would add great complexity to an already complex system. The _emit_update_statements() method is already pretty involved, but it at least knows that for all the rows it processes, there only need be a single UPDATE statement generated, and this helps with performance since the statement is built just once up front. A conditional that would have to peek at every object and check for version_id is NULL, then keep around a copy of another statement that doesnt have the WHERE clause, and also disallow that one record from being batched into executemany(), would be all kinds of error-prone and hard-to-test conditionals all throughout the method; when people go to see that code, we'd tell them, "one person wanted this very odd exception to how it works so we put all this extra complexity" for it, that's something I try to avoid. We've had lots of features like that in the past; the user who wanted the magic feature eventually just does something some other way, the use case is gone, and now the library is stuck with a huge heap of complexity that's very difficult to extract. I try to find ways to make odd use cases possible from the outside without hardcoding them in the internals; this one is tough but making two mappers seems the most expedient approach.

    Adding the limitation for now is not necessarily the "permanent" solution, it's just to clarify things for now.

  12. Mike Bayer repo owner

    second issue w/ the inheritance is a bug, docs state: "We can update our User object without incrementing the version counter as well; the value of the counter will remain unchanged, and the UPDATE statement will still check against the previous value"

  13. Mike Bayer repo owner

    detect and raise for version_id is NULL

    The versioning feature does not support NULL for the version counter. An exception is now raised if the version id is programmatic and was set to NULL for an UPDATE. Pull request courtesy Diana Clarke.

    Fixes: #3673 Change-Id: I8b0da56234a7c7f5e7fde35536e09a6216a5e48a

    → <<cset 990d4c799f2a>>

  14. Log in to comment