detect and raise for version_id is NULL both before UPDATE and after UPDATE/INSERT

Issue #3673 resolved

Diggory Blake created an issue 2016-03-08

Steps to reproduce:

Create a model with a nullable column
Set version_id_col = <name> and version_id_generator = False
Create an instance of the model with version_id=<non-null value>
Set version_id = None and commit the change
Set version_id = <non-null value> and try to commit the change

This sequence of operations causes the following error:

StatementError: (sqlalchemy.exc.InvalidRequestError) A value is required for bind parameter 'process_instance_exclusive_lock' [SQL: u'UPDATE process_instance SET exclusive_lock=%s WHERE process_instance.id = %s AND process_instance.exclusive_lock = %s'] [parameters: [{'exclusive_lock': datetime.datetime(2016, 3, 8, 16, 5, 46), 'process_instance_id': UUID('5f886c4f-e22d-11e5-9ded-b4ae2bd63e01')}]]

As you can see, the previous NULL value does not make it into the formatting arguments, which causes it to fail. Using null() instead of None doesn't seem to make any difference.

Comments (22)

Diggory Blake reporter
- edited description
- 2016-03-08T16:17:06+00:00
Diggory Blake reporter
- edited description
- 2016-03-08T16:17:34+00:00
Mike Bayer repo owner
Create a model with a nullable column

yes, please show me that

As you can see,

I dont see anything without a proper test case :) use the guidelines at http://stackoverflow.com/help/mcve thanks!
- 2016-03-08T16:28:48+00:00

Diggory Blake reporter

Here's a full test case:

"""
version_id nullable bug
"""

from sqlalchemy import *
from sqlalchemy import types
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base,  declared_attr
from sqlalchemy import event

from datetime import datetime

Base = declarative_base()

class Foo(Base):
    __tablename__ = 'foo'
    id = Column(types.Integer, primary_key=True)
    bar = Column(types.DateTime())

    __mapper_args__ = {
        'version_id_col': bar,
        'version_id_generator': False
    }


e = create_engine('sqlite:///:memory:', echo='debug')
Base.metadata.drop_all(e)
Base.metadata.create_all(e)

session = Session(e, autocommit=True, autoflush=False, expire_on_commit=False)

# Create a Foo with a NULL version_id column
x = Foo()
session.add(x)
session.flush()

# Try to update x
x.bar = datetime.now()
session.flush()

2016-03-08T16:43:30+00:00

Mike Bayer repo owner
thanks! interestingly, the test case above which is very nice does not seem to follow the steps you initially describe? e.g. it does not create an instance with "non null value" nor do i see any "version_id=None" in there either.
- 2016-03-08T16:50:13+00:00
Diggory Blake reporter
Yes, I found the extra steps were not necessary in practice to reproduce the problem, so I omitted them.
- 2016-03-08T16:55:24+00:00

Mike Bayer repo owner

great. Please confirm this patch resolves the issue. a NULL value in version_id does not make any sense so it should be prohibited:

diff --git a/lib/sqlalchemy/orm/persistence.py b/lib/sqlalchemy/orm/persistence.py
index b9a1f9d..0c079af 100644
--- a/lib/sqlalchemy/orm/persistence.py
+++ b/lib/sqlalchemy/orm/persistence.py
@@ -986,6 +986,11 @@ def _finalize_insert_update_commands(base_mapper, uowtransaction, states):
         else:
             mapper.dispatch.after_update(mapper, connection, state)

+        if mapper.version_id_col is not None:
+            if state_dict[mapper._version_id_prop.key] is None:
+                raise orm_exc.FlushError(
+                    "Instance does not contain a non-NULL version value")
+

 def _postfetch(mapper, uowtransaction, table,
                state, dict_, result, params, value_params):

2016-03-08T17:08:40+00:00

Diggory Blake reporter
I disagree that NULL values make no sense: I use optimistic concurrency control to access certain models, but certain rows have no contention because they are not shared. For these I was using NULL values, with the hope that UPDATE statements would add the clause WHERE version_id_field IS NULL.

It could be argued that this is a feature request rather than a bug; however, more problematic is that this patch seems to only superficially address the problem: regardless of whether version_id fields are allowed to be NULL, it should not be possible for there to be a mismatch in the SQL formatting code between the number of arguments expected and those provided. There is a more serious problem at the root of this, and I'm currently encountering a second bug which is likely caused by a similar issue, and won't be fixed by this patch:

In order to avoid using NULL, I modified my code to just use a fixed date in the past when I don't care about OCC. However this resulted in a new SQL generation error:

sqlalchemy.exc.ProgrammingError: (_mysql_exceptions.ProgrammingError) (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'WHERE institution.id = 'RDKa\xe5P\x11\xe5\x84M\xb4\xae+\xd6>\x01' AND institution.exclusive_lock ' at line 1") [SQL: u'UPDATE institution SET WHERE institution.id = %s AND institution.exclusive_lock = %s'] [parameters: ('RDKa\xe5P\x11\xe5\x84M\xb4\xae+\xd6>\x01', datetime.datetime(1970, 1, 1, 0, 0))]

(Note the erroneous empty SET clause)

Unfortunately I haven't been able to reproduce this in a small example yet, but I'm still working on that, and I'll post that as soon as I find out how.
- 2016-03-08T17:31:22+00:00
Mike Bayer repo owner
regardless of whether version_id fields are allowed to be NULL, it should not be possible for there to be a mismatch in the SQL formatting code between the number of arguments expected and those provided.

a NULL version identifier on a persistent object was never a supported use case so if supported, it would be an enhancement (however I disagree thats how your use case should be handled). There's no need for the UPDATE formatting code to deal with a potential NULL value (a complicated change that will add latency to the flush process as well) because NULL in a version id column is not supported. the bug here is that this unsupported case is not handled gracefully.

The semantics of "versioning" means, "the object A represented by row X is of version Q". In SQL, "NULL" means literally, "unknown". So in this case if you need your timestamp to store NULL then it's not appropriate as a versioning column, given the current assumption that a mapping with a version_id column expects all rows to be subject to versioning. I would suggest the feature you're looking for, drawing from your exact phrase "certain rows have no contention because they are not shared", is that versioning rules be applied on a per-object basis; that is, some objects express that they don't need to be versioned and should be excluded from the process.

I'd also ask if perhaps these two different classes of objects actually be mapped separately. You can create multiple mappers to the same table and with declarative you can use subclassing to achieve this effect without any repetition in the mapping configuration.
- 2016-03-08T17:43:08+00:00

Diggory Blake reporter

I've been able to reproduce the second issue: it may or may not be related to the first, if not I can open a new issue if you'd prefer:

"""
version_id polymorphic bug
"""

from sqlalchemy import *
from sqlalchemy import types
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base,  declared_attr
from sqlalchemy import event

from datetime import datetime

Base = declarative_base()

class Foo(Base):
    __tablename__ = 'foo'
    id = Column(types.Integer, primary_key=True)
    type = Column(types.Unicode(64), nullable=False)
    bar = Column(types.DateTime(), nullable=False)
    other = Column(types.Integer)

    __mapper_args__ = {
        'polymorphic_on': u'type',
        'polymorphic_identity': u'Foo',
        'version_id_col': bar,
        'version_id_generator': False
    }

class FooDerived(Foo):
    __tablename__ = 'foo_derived'
    id = Column(types.Integer, ForeignKey(u'foo.id'), primary_key=True)
    other2 = Column(types.Integer)

    __mapper_args__ = {
        'polymorphic_identity': u'FooDerived'
    }


e = create_engine('sqlite:///:memory:', echo='debug')
Base.metadata.drop_all(e)
Base.metadata.create_all(e)

session = Session(e, autocommit=True, autoflush=False, expire_on_commit=False)

x = FooDerived()
x.bar = datetime(1970, 1, 1)
session.add(x)
session.flush()

x.other2 = 1
session.flush()

The error:

OperationalError: (sqlite3.OperationalError) near "WHERE": syntax error [SQL: u'UPDATE foo SET  WHERE foo.id = ? AND foo.bar = ?'] [parameters: (1, '1970-01-01 00:00:00.000000')]

2016-03-08T17:46:20+00:00

Mike Bayer repo owner
Yes that is a variant on the same thing, the system expects there to be a new version identifier for every UPDATE emitted. since you didn't set anything, it's surprised. so yes, another check would be needed for that one too.
- 2016-03-08T17:52:18+00:00
Diggory Blake reporter
If version IDs are to be constrained in the ways you've described (not NULL, and must change on every update) then in addition to better error messages: 1) These constraints should be clearly documented 2) There needs to be a more flexible alternative to add custom WHERE clauses whenever rows are updated: perhaps a callback could be specified on the model which returns a custom query clause.

Aside from that, do you know why this second error only shows up when inheritance is used?
- 2016-03-08T17:59:47+00:00
Mike Bayer repo owner
the second error is because it knows it needs to emit UPDATE for the base table because there's a version_id_col. The whole "version_generator=False" thing was only added in recent versions to suit some odd use cases, it's not the primary use versioning was developed for. so as we open up the API to allow more styles of use, we also open it up to more styles of mis-use, which then have to have all new error checking schemes.
- 2016-03-08T18:52:31+00:00
Mike Bayer repo owner
- changed title to detect and raise for version_id is NULL both before UPDATE and after UPDATE/INSERT
- changed milestone to 1.2
- changed component to orm
- 2016-03-08T19:28:01+00:00
diana clarke
- assigned issue to
  
  diana clarke
- 2017-03-14T13:23:03+00:00
Mike Bayer repo owner
the reason I don't like the idea of "allow arbitrary objects to have NULL" for a version id is mostly because it is an extremely odd use case that would add great complexity to an already complex system. The _emit_update_statements() method is already pretty involved, but it at least knows that for all the rows it processes, there only need be a single UPDATE statement generated, and this helps with performance since the statement is built just once up front. A conditional that would have to peek at every object and check for version_id is NULL, then keep around a copy of another statement that doesnt have the WHERE clause, and also disallow that one record from being batched into executemany(), would be all kinds of error-prone and hard-to-test conditionals all throughout the method; when people go to see that code, we'd tell them, "one person wanted this very odd exception to how it works so we put all this extra complexity" for it, that's something I try to avoid. We've had lots of features like that in the past; the user who wanted the magic feature eventually just does something some other way, the use case is gone, and now the library is stuck with a huge heap of complexity that's very difficult to extract. I try to find ways to make odd use cases possible from the outside without hardcoding them in the internals; this one is tough but making two mappers seems the most expedient approach.

Adding the limitation for now is not necessarily the "permanent" solution, it's just to clarify things for now.
- 2017-03-14T14:33:03+00:00
Mike Bayer repo owner
second issue w/ the inheritance is a bug, docs state: "We can update our User object without incrementing the version counter as well; the value of the counter will remain unchanged, and the UPDATE statement will still check against the previous value"
- 2017-05-22T18:29:42+00:00
Mike Bayer repo owner
https://bitbucket.org/zzzeek/sqlalchemy/issues/3996/unincremented-version-id-counter-on
- 2017-05-22T18:31:28+00:00
Mike Bayer repo owner
- changed status to resolved
- add a note to versioning that version id col cannot be null, is not supported.
Change-Id: I3724fea3f2d508210e35827eb1ea17f5e334da19 Fixes: ~~#3673~~ (cherry picked from commit fcaf17766fdd22e67407e432f7666d63439d7a39)

→ <<cset 8cce3beef729>>
- 2017-05-22T18:33:56+00:00
Mike Bayer repo owner
- add a note to versioning that version id col cannot be null, is not supported.
Change-Id: I3724fea3f2d508210e35827eb1ea17f5e334da19 Fixes: ~~#3673~~ (cherry picked from commit fcaf17766fdd22e67407e432f7666d63439d7a39) (cherry picked from commit 8cce3beef729e51eaed2d01a6ba0b0677abc7b36)

→ <<cset da1bc9878b71>>
- 2017-05-22T18:33:56+00:00
Mike Bayer repo owner
- changed status to open
will put dclarke's gerrit in as well
- 2017-05-22T18:51:56+00:00
Mike Bayer repo owner
- changed status to resolved
detect and raise for version_id is NULL

The versioning feature does not support NULL for the version counter. An exception is now raised if the version id is programmatic and was set to NULL for an UPDATE. Pull request courtesy Diana Clarke.

Fixes: ~~#3673~~ Change-Id: I8b0da56234a7c7f5e7fde35536e09a6216a5e48a

→ <<cset 990d4c799f2a>>
- 2017-05-22T21:30:12+00:00
Log in to comment

Assignee: diana clarke

Type: bug

Priority: major

Status: resolved

Component: orm

Milestone: 1.2

Votes: 0

Watchers: 3