SA-version independent pickling of mapped instances

Issue #860 resolved
Former user created an issue

Attribute access for mapped types in SQLAlchemy 0.4 is done using a private state ({{{_state}}}) and using descriptors.

However this means that if a class defines a custom {{{getstate}}} that does not include this internal state variable, the copied instance becames useless.

I have attached an example.

Comments (12)

  1. Mike Bayer repo owner

    _state doesnt use a descriptor right now, its a plain attribute. we save many method calls by doing so.

    however, _state isn't the only attribute that needs to be worried about. there is also potentially _instance_key , _entity_name and _sa_session_id, although the latter probably shouldn't be copied in most cases.

    Theres no non-user-involved solution to this other than assembling expensive descriptors for all of these attributes which would be called hundreds of times, but even that wouldn't work: theres no way to "auto-generate" the _instance_key or _entity_name attributes for a particular instance - if they're not there, we have no idea if the instance came from the database or not (well we assume it didn't, which might not be the case). Auto-generating _state for an established instance would blow away all of the committed attribute state, which would still lead to surprises at flush time that are equally problematic.

    Therefore I'd propose that this issue is strictly one of documentation. If you want to implement __getstate__ and __setstate__ on a mapped instance, you have to be aware of these special attributes as well.

  2. Former user Account Deleted

    I have talked about descriptors because this is the output I get:

    $python orm_deepcopy.py
    obj: Test {id: None, x: test}
    objcopy: Traceback (most recent call last):
      File "orm_deepcopy.py", line 36, in ?
        print 'objcopy:', objcopy
      File "orm_deepcopy.py", line 26, in __repr__
        return "Test {id: %s, x: %s}" % (self.id, self.x)
      File "/home/manlio/projects/svn-external/sqlalchemy/trunk/lib/sqlalchemy/orm/attributes.py", line 40, in __get__
        return self.impl.get(obj._state)
    AttributeError: 'Test' object has no attribute '_state'
    

    and here is a patch that works but assumes that all the ''user'' variables are inside the {{{dict}}} dictionary:

    --- orm/attributes.py   (revisione 3748)
    +++ orm/attributes.py   (copia locale)
    @@ -37,7 +37,11 @@
         def __get__(self, obj, owner):
             if obj is None:
                 return self
    -        return self.impl.get(obj._state)
    +        state = getattr(obj, '_state', None)
    +        if state is not None:
    +            return self.impl.get(obj._state)
    +        else:
    +            return obj.__dict__[self.impl.key](self.impl.key)
    
         def get_history(self, obj, **kwargs):
             return self.impl.get_history(obj._state, **kwargs)
    

    By the way: the reason I want to define a custom getstate is exactly to remove all these internal variables!

    Thanks Manlio Perillo

  3. Former user Account Deleted

    Sorry, I have to do a clarification.

    I do not want to use the ''copied'' object in the SQLAlchemy orm, but just as an ordinary instance.

  4. jek

    May be preferable long-term to export a pair of pickle helper functions for users. One takes an instance and spits out some stuff encapsulating the internals interface, the other takes an instance and some stuff and sets up the SA-internal fields on the instance.

    class MyClass(object):
        def __getstate__(self):
            return { 'foo': 'bar', 'sastuff': orm.getstate(self) }
        def __setstate__(self, data):
            orm.setstate(self, data['sastuff']('sastuff'))
            self.foo = data['foo']('foo')
    
  5. Former user Account Deleted

    In one of my programs, I serialize a mapped instance in a database, using Pickle.

    When I unpickle the instance later, all I need is the object state, so I do not want to serialize extra data that I do not need (and this extra data could cause problems if I unpickle the data using a new version of SQLAlchemy - this usually should not happen, I think).

  6. Mike Bayer repo owner

    Well, the simple fact is that once you say mapper(MyClass, sometable), the class is now instrumented. The patch you have is incomplete; while a get() will work, sets and deletes will still fail.

    But the larger issue is that I don't think SQLAlchemy should be weighed down with the burden of supporting normal class behavior when the class has been explicitly instrumented by the user and has been explicitly broken by the user to be incompatible with that instrumentation. These burdens include the performance hits inherent in having the many dozens of methods which access _state needing to check for its presence, as well as the testing/future development burden such that all future changes to the attribute package need to support this use case. Not to mention that the behavior of the class would be inconsistent, things like backrefs would cease to function for example.

    It's not hard to just copy your mapped instance's dict to a new, non-mapped class if you need an instance that is not mapped. Or if this is a global change, clear_mappers() will remove all existing mappings and instrumentations.

  7. Former user Account Deleted

    Ok, I will just serialize the mapped instance {{{dict}}}, after having removed SQLAlchemy internal state.

    However, if I'm not wrong, how mapped instances are handled does means that pickling an instance with one SQLALchemy version and then unpicking it with a different version is not really supported.

    If I'm right, then this should be documented.

    Thanks Manlio Perillo

  8. Mike Bayer repo owner

    right now (pre 0.4.2), we have a lot more support of pickle for instances; InstanceState is now smarter about pickling, and all the lazy, expired, and deferred loader callables are now picklable as well - so all of that gets saved and restored with a minimal amount of data overhead. we can and should make it such that future versions of InstanceState learn to detect previous versions, perhaps by pickling a magic number along with the other state information. new releases of SA will provide an extra module of "converters" for previous states that get picked up within InstanceState.__getstate__(). then you don't need helpers or anything.

  9. Mike Bayer repo owner

    as best as I can tell, ensuring that an ORM object pickled in 0.4 can be unpickled in 0.5 is what we're going for here; I think if we can add a "magic number" to InstanceState.__getstate__() that can identify when things have changed. If a mismatch is detected, at the very least we can raise an informative error, and at best we can run the object through a converter. This won't work for really old instances that don't have an InstanceState though.

    I'm still having a hard time caring much about building a bunch of "converters", though. Pickling mapped instances for long spans of time (such as, in databases or long-lived files) is not an advised practice. If you are upgrading an application that uses caching, a cache clear shouldn't be that big a deal.

  10. Mike Bayer repo owner

    I've added a ClassManager.teardown_instance() method to complement ClassManager.setup_instance() in 653191d913b980964d45b4ef92ed6a0b5d95af91. This way, you can pickle the object's state without anything SA related going in. The downside is that in a pickle scenario it modifies the original instance. While we could make some kind of helper that modifies a given __dict__ inside of __getstate__, this wouldn't work with custom class managers that don't work that way, so I'm not seeing an "official" route there.

  11. Log in to comment