StaleDataError on UPDATE/DELETE of mapped obj with nullable composite primary key

Issue #2127 resolved
Former user created an issue

When creating a table with composite primary keys and one of them is nullable (possibly a bad idea), orm representation of the object half-works.

Creating an object in the table is fine. Selecting the object is fine. Updating and Deleting breaks with a sqlalchemy.orm.exc.StaleDataError exception on commit.

Comments (23)

  1. Mike Bayer repo owner

    I think you'll find that this example won't work in any database other than SQLite. The NULL wouldn't be accepted. I'd rather go with a descriptive error message in SQLA when the object is flushed or fetched. But this is tricky - we do support a primary key with some NULL values in it, but that is specifically for a composite mapping across several tables, where one or more of the tables might not be present in the row (i.e. an outerjoin).

    Doesn't feel like an 0.7.0 thing to me in either case, though if we want to commit something for this early that is fine.

  2. Former user Account Deleted

    You're right. PostgreSQL, MySQL, and Oracle all force primary keys to be NOT NULL.

    Perhaps SQLAlchemy should do a check and raise an error when creating a Column that is both primary_key=True and explicitly nullable=True? Or simply ignore override nullable=True to False when primary_key=True (and document this).

    This would help prevent users from getting into a half-working state that makes less sense.

    • shazow
  3. Mike Bayer repo owner

    Replying to guest:

    Perhaps SQLAlchemy should do a check and raise an error when creating a Column that is both primary_key=True and explicitly nullable=True?

    well if you do that, and it succeeds, that means the backend you're using is OK with it (and is probably SQLite). I wouldn't want to prevent that usage. It's just the ORM doesn't necessarily know what to do with it. It also isn't safe to assume an application is even using CREATE TABLE, so a check there wouldn't really catch this usage in any case.

    Or simply ignore override nullable=True to False when primary_key=True (and document this).

    we pretty much try not to ever make any guesses or decisions like this.

    I have a feeling the issue here is that we use bound parameters to set up the primary key value and when you pass in a NULL you need to use the IS comparator, and it's too late at that point. Might be a quickie.

  4. Former user Account Deleted

    Replying to zzzeek:

    well if you do that, and it succeeds, that means the backend you're using is OK with it (and is probably SQLite). I wouldn't want to prevent that usage. It's just the ORM doesn't necessarily know what to do with it. It also isn't safe to assume an application is even using CREATE TABLE, so a check there wouldn't really catch this usage in any case.

    Fair enough, as long as it works with UPDATE/DELETE too.

  5. Mike Bayer repo owner
    • changed milestone to 0.7.0

    OK !

    Not a quickie. Sort of. It is not hard to detect the NULL setting in the mapper's flush phase. But it steps on some crinkly logic that involves some really hard edge cases regarding primary keys changing their values. So here's a patch that does it for UPDATE plus modifies some of that crinkliness. The crinkly changes now need some tests, for attributes behavior and such. It's pretty fundamental so this pushes it back to 0.7 (we like to have all the core behavior changes, even if very subtle like this one, up at the point of .0 if not a beta). DELETE needs to be added in too. I'm fine finishing this as it's a pretty deep issue.

    diff -r 3b22a2ac97bdb59b8b4afd7c9b0c525e03a8230a lib/sqlalchemy/orm/attributes.py
    --- a/lib/sqlalchemy/orm/attributes.py  Sat Apr 09 14:56:23 2011 -0400
    +++ b/lib/sqlalchemy/orm/attributes.py  Sat Apr 09 18:20:04 2011 -0400
    @@ -28,6 +28,12 @@
     NO_VALUE = util.symbol('NO_VALUE')
     NEVER_SET = util.symbol('NEVER_SET')
    
    +PASSIVE_RETURN_NEVER_SET = util.symbol('PASSIVE_RETURN_NEVER_SET'
    +"""Symbol indicating that a 'default' value, i.e. None or blank
    +collection, should not be assigned to an attribute when a get()
    +is performed and no value was present.  NEVER_SET is returned
    +instead.
    +""")
    
     PASSIVE_NO_INITIALIZE = util.symbol('PASSIVE_NO_INITIALIZE',
     """Symbol indicating that loader callables should
    @@ -429,8 +435,11 @@
                     elif value is not ATTR_EMPTY:
                         return self.set_committed_value(state, dict_, value)
    
    -            # Return a new, empty value
    -            return self.initialize(state, dict_)
    +            if passive is PASSIVE_RETURN_NEVER_SET:
    +                return NEVER_SET
    +            else:
    +                # Return a new, empty value
    +                return self.initialize(state, dict_)
    
         def append(self, state, dict_, value, initiator, passive=PASSIVE_OFF):
             self.set(state, dict_, value, initiator, passive=passive)
    @@ -471,7 +480,7 @@
    
             # TODO: catch key errors, convert to attributeerror?
             if self.dispatch._active_history:
    -            old = self.get(state, dict_)
    +            old = self.get(state, dict_, PASSIVE_RETURN_NEVER_SET)
             else:
                 old = dict_.get(self.key, NO_VALUE)
    
    @@ -489,7 +498,7 @@
                 return
    
             if self.dispatch._active_history:
    -            old = self.get(state, dict_)
    +            old = self.get(state, dict_, PASSIVE_RETURN_NEVER_SET)
             else:
                 old = dict_.get(self.key, NO_VALUE)
    
    @@ -1064,7 +1073,7 @@
                 # dont let ClauseElement expressions here trip things up
                 return cls((), [current](current), ())
             else:
    -            if original is not None:
    +            if original not in (NEVER_SET, NO_VALUE):
                     deleted = [original](original)
                 else:
                     deleted = ()
    diff -r 3b22a2ac97bdb59b8b4afd7c9b0c525e03a8230a lib/sqlalchemy/orm/mapper.py
    --- a/lib/sqlalchemy/orm/mapper.py  Sat Apr 09 14:56:23 2011 -0400
    +++ b/lib/sqlalchemy/orm/mapper.py  Sat Apr 09 18:20:04 2011 -0400
    @@ -1955,7 +1955,7 @@
                                         params[col.key](col.key) = value
    
                                     if col in pks:
    -                                    if history.deleted:
    +                                    if history.deleted and not row_switch:
                                             # if passive_updates and sync detected
                                             # this was a  pk->pk sync, use the new
                                             # value to locate the row, since the
    @@ -1980,11 +1980,21 @@
                                             del params[col.key](col.key)
                                             value = history.added[0](0)
                                             params[col._label](col._label) = value
    +                                    if value is None and hasdata:
    +                                        raise sa_exc.InvalidRequestError(
    +                                                "Can't update table "
    +                                                "using NULL for primary key "
    +                                                "value")
                                     else:
                                         hasdata = True
                                 elif col in pks:
                                     value = state.manager[prop.key](prop.key).\
                                                 impl.get(state, state_dict)
    +                                if value is None:
    +                                    raise sa_exc.InvalidRequestError(
    +                                                "Can't update table "
    +                                                "using NULL for primary "
    +                                                "key value")
                                     params[col._label](col._label) = value
                         if hasdata:
                             update.append((state, state_dict, params, mapper, 
    diff -r 3b22a2ac97bdb59b8b4afd7c9b0c525e03a8230a lib/sqlalchemy/orm/state.py
    --- a/lib/sqlalchemy/orm/state.py   Sat Apr 09 14:56:23 2011 -0400
    +++ b/lib/sqlalchemy/orm/state.py   Sat Apr 09 18:20:04 2011 -0400
    @@ -521,7 +521,6 @@
    
             # store strong ref'ed version of the object; will revert
             # to weakref when changes are persisted
    -
             obj = self.manager.new_instance(state=self)
             self.obj = weakref.ref(obj, self._cleanup)
             self._strong_obj = obj
    diff -r 3b22a2ac97bdb59b8b4afd7c9b0c525e03a8230a test/orm/test_attributes.py
    --- a/test/orm/test_attributes.py   Sat Apr 09 14:56:23 2011 -0400
    +++ b/test/orm/test_attributes.py   Sat Apr 09 18:20:04 2011 -0400
    @@ -456,10 +456,10 @@
             del f.y
    
             eq_(results, [           ('set', f, 5, None),
    +            ('set', f, 5, attributes.NEVER_SET),
                 ('set', f, 17, 5),
                 ('remove', f, 17),
    -            ('set', f, [1,2,3](
    -), None),
    +            ('set', f, [1,2,3](1,2,3), attributes.NEVER_SET),
                 ('set', f, [4,5,6](4,5,6), [1,2,3](1,2,3)),
                 ('remove', f, [4,5,6](4,5,6))
             ])
    
  6. Mike Bayer repo owner

    Not done yet. Don't want to change attributes too dramatically on this go. note #2128 which proposes to overhaul the history API.

  7. Former user Account Deleted

    The changes here are causing me grief due to the raise sa_exc.FlushError("Can't update table using NULL for primary key value").

    The use case is a strange old table on a legacy database that we are mapping which has no primary key, but which has a unique key. We have been successfully using the unique key (which allows for a column to be null) and setting the mapper's allow_partial_pks=True.

    Will you please continue to support this use case by raising the three FlushErrors in mapper.py 1da499a838cb3cc50c98f7b32de6d35a09f00fe6 only if allow_partial_pks==False?

    Do you see any problems with that?

  8. Mike Bayer repo owner

    Replying to kentbower:

    The changes here are causing me grief due to the raise sa_exc.FlushError("Can't update table using NULL for primary key value").

    The use case is a strange old table on a legacy database that we are mapping which has no primary key, but which has a unique key. We have been successfully using the unique key (which allows for a column to be null) and setting the mapper's allow_partial_pks=True.

    Will you please continue to support this use case by raising the three FlushErrors in mapper.py 1da499a838cb3cc50c98f7b32de6d35a09f00fe6 only if allow_partial_pks==False?

    Do you see any problems with that?

    can you attach a test case, I need to see how the UPDATE/DELETE manages to match on the NULL when it does not for the sqlite test attached to this ticket. I don't actually understand right now why the original test fails, if SQLite allows the NULL to be written into the PK.

  9. Mike Bayer repo owner

    yeah the UPDATE/INSERT generated by the mapper doesn't use "IS NULL". I've no idea how this could be working for you.

  10. Former user Account Deleted

    Replying to zzzeek:

    yeah the UPDATE/INSERT generated by the mapper doesn't use "IS NULL". I've no idea how this could be working for you. Exactly, but I'm almost certain it used to (in 0.6.4). Is that accurate?

  11. Mike Bayer repo owner

    the attached test emits "UPDATE .. SET ... WHERE x=? AND y=?" then puts None into the bind param, in 0.6. It fails.

  12. Former user Account Deleted

    I just found the same thing... I must have been remembering something else... I was certain I remembered "IS NULL" being emitted back when I was making this work with "allow_partial_pks=True".

    So, what that means is we never update these rows (which makes sense). It also means that only 1 of the 3 FlushErrors has "broken" my case:

    This one is being reached even when there is no update statement to emit, is that accurate?

                                if history.added:
    ...
                                elif col in pks:
                                    value = state.manager[prop.key](prop.key).\
                                                impl.get(state, state_dict)
                                    if value is None and not mapper.allow_partial_pks:
                                        raise sa_exc.FlushError(
                                                    "Can't update table "
                                                    "using NULL for primary "
                                                    "key value")
                                    params[col._label](col._label) = value
    
  13. Mike Bayer repo owner

    an update definitely occurs there, though you're right not if no other values have changed. It seems like you'd want the check to just happen later, when there's definitely "data" to be updated. But this is all a little silly, why don't you just stick an event listener on this weird class that cancels out any history on the object ? You aren't getting to that block unless some attribute has been set. Or just don't set any state on it, even better.

  14. Former user Account Deleted

    yeah, I've gotta figure out what the orm thinks has changed, because it isn't emitting an UPDATE (or at least none that is failing due to StaleDataError)... Then I can work around it some other way maybe like you suggest.

  15. Former user Account Deleted

    When I hit:

    if hasdata:
    

    For these cases, it is false. So what do you mean by:

    an update definitely occurs there ''...snip...'' You aren't getting to that block unless some attribute has been set.

    I'm certainly getting into the block, but no attribute has been changed (these are merge()d objects, so in a way, many attributes have been "set"... is that what you mean?)

  16. Former user Account Deleted

    Replying to zzzeek:

    But this is all a little silly, why don't you just stick an event listener on this weird class that cancels out any history on the object ?

    Forgive me for struggling here, but from 'before_flush', instances is None and from 'before_update' there are no changes to undo (there are no columns with history). Still, I'm hitting that exception. I can't make this viewonly=True because the table is ordercomments and we do write our own, but when we write them, we populate the entire "primary key". (When the legacy system writes them, it sometimes leaves one column null.)

    Think we could wait until we know hasdata is True before raising that exception or any other ideas?

  17. Former user Account Deleted

    Replying to zzzeek:

    the attached test emits "UPDATE .. SET ... WHERE x=? AND y=?" then puts None into the bind param, in 0.6. It fails.

    FYI: I figured out why I thought it emitted "IS NULL": it is merge() that is happy to render the SQL of the primary key as "IS NULL" when allow_partial_pks=True. That is, if sqlalchemy were to support the use case of a unique key being used on the database side and sqlalchemy mapping this as the primary key with allow_partial_pks=True, merge() already supports this use case, it is only flush that doesn't.

    I can't imagine the sqlalchemy python identity_map has a problem with part of the identity tuple being None, so that works consistently with a unique key on the database side.

    Not officially asking for sqlalchemy to support that use case, but what are your feelings about it?

  18. Mike Bayer repo owner

    can you create a demonstration script + another ticket for this ? that would help me to see why something even comes up here. if we really needed to work around it, I'd add some more flags so that the FlushError in question only raises once we know an UPDATE will take place.

  19. Former user Account Deleted

    Sure, when you ask for another ticket, there are two potential ones: * Don't raise this FlushError unless needed * Fully support use case of mapping a database unique key as a sqlalchemy primary key with allow_partial_pks=True Which ticket are you talking about opening?

  20. Mike Bayer repo owner

    I was talking about the FlushError. With allow_partial_pks I guess the use case is mapping to a set of cols that are not actually PK cols but are on one table huh ? That would be a separate ticket but I don't have much interest in it ...(I'd commit a full implementation if it was super clean/non-intrusive/full tests, of course)

  21. Log in to comment