1. Pypy
  2. Untitled project
  3. pypy
  4. Issues

Issues

Issue #2464 resolved

getset_descriptor cannot access `__objclass__`

Mike McKerns
created an issue

It looks like a getset_descriptor has an __objclass__ (as seen with dir), but it's not available as an attribute. __objclass__ is very useful in maintaining a pointer to the descriptor's class. For example, given an __objclass__ attribute, one could pickle a descriptor... and thus make a broader class of objects available to multiprocessing, parallel, and distributed computing. (I'm the dill author, and this is a huge blocker for people using pypy + dill)

>>>> class _d(object):
....   def _method(self):
....     pass
....     
>>>> d = _d.__dict__['__dict__']
>>>> d
<getset_descriptor object at 0x0000000104d4b3f0>
>>>> dir(d)
['__class__', '__delattr__', '__delete__', '__doc__', '__format__', '__get__', '__getattribute__', '__hash__', '__init__', '__name__', '__new__', '__objclass__', '__reduce__', '__reduce_ex__', '__repr__', '__set__', '__setattr__', '__str__', '__subclasshook__']
>>>> d.__objclass__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: generic property has no __objclass__

Can the __objclass__ attribute be made available as a pointer to the descriptor's class? It seems like a bug that __objclass__ is missing.

Comments (16)

  1. Armin Rigo

    As the message says, the problem is that a few property objects are generic. For example you'll find out that in PyPy, X.__dict__['__dict__'] returns the exact same object for different classes.

    We can fix it anyway by making these property objects non-generic, i.e. creating manually a new property object instead of reusing the same one.

  2. Armin Rigo

    Fixed: the __dict__ and _weakref__ descriptors, added dynamically to some user-defined subclasses, are now created for each subclass and they have __objclass__. Is it enough? There are many other descritors for built-in types that don't have __objclass__. This could be fixed but it's more work.

  3. Mike McKerns reporter

    Wow, thanks for the fast response, Armin.

    dill only uses __objclass__ for serialization of a very few types: MemberDescriptorType, GetSetDescriptorType, MethodDescriptorType, and WrapperDescriptorType... of which, I think pypy only has the fist two. If there's an __objclass__ which points to the descriptor's class for the first two, then that's all dill would require.

  4. Armin Rigo

    No, I only added a valid __objclass__ for the __dict__ and __weakref__ descriptors that are added dynamically to some user-defined subclasses. There are still many other (non-dynamically-added) GetSetDescriptors where reading __objclass__ would give the same error message.

  5. Mike McKerns reporter

    I tested against the nightly build and It now works in the majority of cases, (as seen in my two tests below):

    >>>> class _d(object):
    ....   def _method(self):
    ....     pass
    ....     
    >>>> v = _d.__dict__['__dict__']
    >>>> v
    <getset_descriptor object at 0x0000000104b67b50>
    >>>> v.__objclass__
    <class '__main__._d'>
    >>>> 
    >>>> import dill
    >>>> _v = dill.loads(dill.dumps(v))
    >>>> _v
    <getset_descriptor object at 0x000000010396a7f0>
    >>>> 
    >>>> class _n(object):
    ....   __slots__ = ['descriptor']
    ....   
    >>>> _n.descriptor
    <member_descriptor object at 0x000000010519d4e0>
    >>>> 
    >>>> m = _n.descriptor
    >>>> m.__objclass__
    <class '__main__._n'>
    >>>> 
    >>>> _m = dill.loads(dill.dumps(m))
    >>>> _m
    <member_descriptor object at 0x000000010519ec80>
    >>>> 
    

    However (as you say above), it doesn't work for all cases:

    >>>> type.__dict__['__dict__'].__objclass__
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AttributeError: generic property has no __objclass__
    >>>> # expected: <type 'type'>
    >>>>
    

    That means, some descriptors will serialize and some won't. That seems inconsistent to me. I also don't know whether it is just a corner case for people to try to serialize a descriptor from a builtin directly and it won't occur except very rarely... or if it's a case that will come up very frequently, as pickling uses both recursive serialization and often global dict serialization (which means: if there's a builtin's descriptor exposed anywhere globally, or if there's a builtin object that needs to serialize it's own __dict__, it will fail).

    My two test cases work, so totally +1 to you for adding the patch so quickly... but maybe this issues shouldn't close until the following discussion is addressed:

    1. Is it a problem to add an __objclass__ attribute where it is missing?
    2. Would the above bullet item constitute a new ticket?
  6. Mike McKerns reporter

    I just checked it against dill.tests.test_objects, which checks the serialization of most objects in the stdlib (I have a lot of them, but it's not a totally complete list), and it still fails on the same amount for pypy using the latest build with this patch versus a build prior to this patch. So that means, I think, the impact might just be confined to user defined classes.

    On trying a typical case for a user defined class, There's still a critical failure (which, may be it's own issue that wasn't exposed before, but it seems like it's a bug):

    >>>> class _d(object):
    ....   def _method(self):
    ....     pass
    ....     
    >>>> d = _d.__dict__
    >>>> d.items()
    [('__module__', '__main__'), ('_method', <function _method at 0x0000000103bf8020>), ('__dict__', <getset_descriptor object at 0x0000000103bcad40>), ('__weakref__', <getset_descriptor object at 0x0000000103bcad90>), ('__doc__', None)]
    >>>> import dill
    >>>> [(i,dill.pickles(j)) for (i,j) in d.items()]
    [('__module__', True), ('_method', True), ('__dict__', True), ('__weakref__', False), ('__doc__', True)]
    

    the False indicates a failure in serializing that item from _d.__dict__

    >>>> _d.__dict__['__weakref__']
    <getset_descriptor object at 0x0000000103bcad90>
    >>>> _d.__dict__['__weakref__'].__objclass__
    <class '__main__._d'>
    

    turn on trace to see what happens during serialization

    >>>> dill.detect.trace(True)
    >>>> v = _d.__dict__['__weakref__']
    >>>> dill.dumps(v)
    Wr: <getset_descriptor object at 0x0000000103bcad90>
    F2: <function _getattr at 0x0000000103dcf5b0>
    # F2
    T2: <class '__main__._d'>
    F2: <function _create_type at 0x0000000103dcf010>
    # F2
    T1: <type 'type'>
    F2: <function _load_type at 0x0000000103dcef98>
    # F2
    # T1
    T1: <type 'object'>
    # T1
    D2: <dict.__repr__ of {'__module__': '__main__', '_method': <function _method at 0x0000000103bf8020>, '__doc__': None}>
    F1: <function _method at 0x0000000103bf8020>
    F2: <function _create_function at 0x0000000103dcf088>
    # F2
    Co: <code object _method, file '<stdin>', line 2>
    T1: <type 'code'>
    # T1
    # Co
    D1: <dict__': <getset_descriptor object at 0x0000000103bcad40>, '__weakref__': <getset_descriptor object at 0x0000000103bcad90>, '__doc__': None}), 'dill': <module 'dill' from '/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.6.dev0-py2.7.egg/dill/__init__.pyc'>, 'v': <getset_descriptor object at 0x0000000103bcad90>}>
    # D1
    D2: <dict.__repr__ of {}>
    # D2
    # F1
    # D2
    # T2
    # Wr
    '\x80\x02cdill.dill\n_getattr\nq\x00cdill.dill\n_create_type\nq\x01(cdill.dill\n_load_type\nq\x02U\x08TypeTypeq\x03\x85q\x04Rq\x05U\x02_dq\x06h\x02U\nObjectTypeq\x07\x85q\x08Rq\t\x85q\n}q\x0b(U\n__module__q\x0cU\x08__main__q\rU\x07_methodq\x0ecdill.dill\n_create_function\nq\x0f(h\x02U\x08CodeTypeq\x10\x85q\x11Rq\x12(K\x01K\x01K\x01MC\x03U\x04d\x00\x00Sq\x13N\x85q\x14)U\x04selfq\x15\x85q\x16U\x07<stdin>q\x17U\x07_methodq\x18K\x02U\x02\x00\x01q\x19))tq\x1aRq\x1bc__builtin__\n__main__\nh\x18NN}q\x1ctq\x1dRq\x1eU\x07__doc__q\x1fNutq Rq!U\x0b__weakref__q"U0<getset_descriptor object at 0x0000000103bcad90>q#\x87q$Rq%.'
    >>>> v = _d.__dict__
    >>>> dill.dumps(v)
    T1: <type 'dictproxy'>
    F2: <function _load_type at 0x0000000103dcef98>
    # F2
    # T1
    '\x80\x02cdill.dill\n_load_type\nq\x00U\rDictProxyTypeq\x01\x85q\x02Rq\x03)\x81q\x04.'
    

    ok, the dump "looks fine" at first pass... so let's try a load

    >>>> v = _d.__dict__['__weakref__']
    >>>> dill.pickles(v)            # this is essentially `loads(dumps(object))`
    Wr: <getset_descriptor object at 0x0000000103bcad90>
    F2: <function _getattr at 0x0000000103dcf5b0>
    # F2
    T2: <class '__main__._d'>
    F2: <function _create_type at 0x0000000103dcf010>
    # F2
    T1: <type 'type'>
    F2: <function _load_type at 0x0000000103dcef98>
    # F2
    # T1
    T1: <type 'object'>
    # T1
    D2: <dict.__repr__ of {'__module__': '__main__', '_method': <function _method at 0x0000000103bf8020>, '__doc__': None}>
    F1: <function _method at 0x0000000103bf8020>
    F2: <function _create_function at 0x0000000103dcf088>
    # F2
    Co: <code object _method, file '<stdin>', line 2>
    T1: <type 'code'>
    # T1
    # Co
    D1: <dict__': <getset_descriptor object at 0x0000000103bcad40>, '__weakref__': <getset_descriptor object at 0x0000000103bcad90>, '__doc__': None}), 'dill': <module 'dill' from '/Users/mmckerns/lib/python2.7/site-packages/dill-0.2.6.dev0-py2.7.egg/dill/__init__.pyc'>, 'v': <getset_descriptor object at 0x0000000103bcad90>}>
    # D1
    D2: <dict.__repr__ of {}>
    # D2
    # F1
    # D2
    # T2
    # Wr
    False
    

    it produces something... so let's turn off trace, and let's see what we get

    >>>> dill.detect.trace(False)
    >>>> v = _d.__dict__['__weakref__']
    >>>> _v = dill.loads(dill.dumps(v))
    >>>> _v
    >>>> type(_v)
    <type 'NoneType'>
    >>>> v
    <getset_descriptor object at 0x0000000103bcad90>
    >>>> 
    

    Ultimately, it produces a weakref to None instead of a weakref to the descriptor. As to why that happens, it will take a little more digging.

    Do you want this as a separate issue?

  7. Mike McKerns reporter

    BTW: If you want to read the dill "trace", it's listing the items it tries to serialize recursively... so Wr: is the original weakref item... then it pickles one of the function types (F2). You next see # F2, which means the current object is finished (e.g. the function was pickled). It moves to T2, which is a class... and ultimately into some internal items that produce some different cases for dicts (D2 and D1). I'm not mentioning some of the steps, like it serializing a code object (Co) for the sake of brevity... but ultimately, it serializes the class # T1, then finishes the original object (# Wr). This chain doesn't guarantee it produces a weakref, it just means it successfully dumped and loaded... however, if the serialization is working correctly, it should produce a new instance that is a copy of the original weakref -- and that is not the case.

  8. Mike McKerns reporter

    Just a side note, it could be that the solution is to modify something in dill for the path pypy takes through the above trace... but I'm not sure what is needed until I dig a bit further into what happens to produce the None.

  9. Armin Rigo

    It seems you're recursively trying to serialize everything reachable from an object. At that point it is expected that you'll hit internal details that are different on PyPy (and also different in one version of CPython and the next). It's very hard to get exactly the same zoo of internal types as CPython. Yes, I think your best bet is to try to have special cases in your library to attempt to work on PyPy and still give the result that you expect.

  10. Mike McKerns reporter

    Unfortunately that is often how pickling has to be done, but I'll have a look and see what exactly the cause is... and then will post what I find. If it's something I can work around, then I'll do it. If not, and it's due to a missing attribute, or something that needs implemented outside of dill, then I'll let you know.

    Note that you don't need to get the same zoo of internal types as CPython, you just need to get a valid path to save the state of an object so it can be reproduced. So, if I can't workaround, regardless of the path through the "zoo" of object needed, then I'll have to ask for more help.

    Also, let me know if it is a problem to add an __objclass__ attribute where it is currently missing. Without it, dill can't serialize descriptors for those objects.

    Thanks again for the quick feedback and the quick turnaround on the patch.

  11. Armin Rigo

    We can probably add the missing __objclass__ attribute at many places. In some cases it is not possible right now "by design", because the same descriptor is used in several classes, like __dict__ and __weakref__ were before I patched it. So yes, a similar fix could be done for these remaining cases, involving duplicating the descriptor.

    Doing them all systematically is work and hard to test. At this point I would certainly be happy to accept pull requests for the fixes as needed.

    But I still don't see the point: the missing ones should all be getsetdescriptor of built-in types. I don't see what serializing the content of built-in types can give you. When unserializing, you need anyway to find the same built-in type, and then its saved content is useless: it can't even be written into the type, as built-in types are immutable.

  12. Mike McKerns reporter

    The problem is that with serialization, no matter how you do it, at some point, with many objects, you have to pickle globals or the object's __dict__, and that forces you to have to pickle all sorts of weird objects you wouldn't otherwise care about.

    Pickling the descriptor for a built-in is a very easy case... I agree, it's almost trivial. However, it's not possible without having a reference to the object's class, and then it's a blocker to serialization for any object that requires that type to describe it's state. That's the problem in a nutshell, and why even for built-in types the __objclass__ attribute is important.

    I've had several requests for pypy + dill be able to handle all descriptors... so maybe I or one of the others can get you a PR.

  13. Mike McKerns reporter

    I was able to work-around the other issues I mentioned above, and they are now in dill... and as far as I can see (from my test suite), it looks like there's a pretty solid pypy + dill compatibility. The only issue remaining is the lack of an __objclass__ attribute on descriptors on builtins.

    >>>> class _d(object):
    ....   def _method(self):
    ....     pass
    ....     
    >>>> d = _d.__dict__
    >>>> import dill
    >>>> [i for (i,j) in d.items() if dill.detect.errors(j)]
    []
    >>>>
    

    You can see for user-defined objects, it works pretty well, and it's really just the descriptor issue on builtins that's going to be a blocker. I believe nothing can be done about that in dill, and the change would have to be in pypy (to support the following):

    >>>> d = type.__dict__
    >>>> [i for (i,j) in d.items() if dill.detect.errors(j)]
    ['__name__', '__bases__', '__base__', '__mro__', '__dict__', '__flags__', '__module__', '__abstractmethods__', '__weakref__', '__doc__']
    >>>> set(type(j) for (i,j) in d.items() if dill.detect.errors(j))
    set([<type 'getset_descriptor'>])
    >>>> dill.detect.errors(d['__name__'])
    AttributeError('generic property has no __objclass__',)
    >>>> 
    
  14. Mike McKerns reporter

    I've tested a number of different objects with the latest build, and I only found one that fails with the reported error. I think, with one slight addition, that should cover it.

    >>>> import dill
    >>>> d = type.__dict__
    >>>> [(i,dill.detect.errors(j)) for i,j in d.items() if dill.detect.errors(j)]
    [('__doc__', AttributeError('generic property has no __objclass__',))]
    >>>> 
    

    The descriptor for __doc__ seems to be missing __objclass__. Again, thanks much for making these changes.

  15. Log in to comment