py.execnet.Message serialises via repr

Issue #30 duplicate
Former user created an issue

I'm trying to understand why repr/eval was used for marshalling instead of pickle or yaml. This means that it's not possible to send objects like datetime.datetimes across a channel (which fails silently when you try to do this).

The relevant code: {{{

!python

def writeto(self, io):
    # XXX marshal.dumps doesn't work for exchanging data across Python
    # version :-(((  There is no sane solution, short of a custom
    # pure Python marshaller
    data = self.data
    if isinstance(data, str):
        dataformat = 1
    else:
        data = repr(self.data)  # argh
        dataformat = 2
    header = struct.pack(HDR_FORMAT, self.msgtype, dataformat,
                                     self.channelid, len(data))
    io.write(header + data)

def readfrom(cls, io):
    header = io.read(HDR_SIZE)
    (msgtype, dataformat,
     senderid, stringlen) = struct.unpack(HDR_FORMAT, header)
    data = io.read(stringlen)
    if dataformat == 1:
        pass
    elif dataformat == 2:
        data = eval(data, {})   # reversed argh
    else:
        raise ValueError("bad data format")
    msg = cls._types[msgtype](senderid, data)
    return msg
readfrom = classmethod(readfrom)

}}}

It would seem that using pickle/cPickle as format #3 would be reasonable and backward compatible. Yet it's hard to fathom that this wouldn't have just been done in the first place without good reason.

If there is a good reason, what is it?

If the reason is no longer valid, can this be patched (also, what's the release schedule)?

Thanks! Chris

Comments (9)

  1. cegner

    Found comments about this at http://www.mail-archive.com/py-dev@codespeak.net/msg00450.html

    Proposed patch:

    import cPickle as Pickle
    
        def writeto(self, io):
            # XXX marshal.dumps doesn't work for exchanging data across Python
            # version :-(((  There is no sane solution, short of a custom
            # pure Python marshaller
            data = self.data
            if isinstance(data, str):
                dataformat = 1
            else:
                try:
                    data = Pickle.dumps( self.data )
                    dataformat = 3
                except Pickle.PicklingError, pe:
                    data = repr(self.data)  # argh
                    dataformat = 2
            header = struct.pack(HDR_FORMAT, self.msgtype, dataformat,
                                             self.channelid, len(data))
            io.write(header + data)
    
        def readfrom(cls, io):
            header = io.read(HDR_SIZE)
            (msgtype, dataformat,
             senderid, stringlen) = struct.unpack(HDR_FORMAT, header)
            data = io.read(stringlen)
            if dataformat == 1:
                pass
            elif dataformat == 2:
                data = eval(data, {})   # reversed argh
            elif dataformat == 3:
                data = Pickle.loads( data )
            else:
                raise ValueError("bad data format")
            msg = cls._types[msgtype](senderid, data)
            return msg
        readfrom = classmethod(readfrom)
    
  2. Holger Krekel repo owner

    first of all, i am happy to accept patches but they need tests. In this case there is a need to to have automated tests that pickling works fine on selected data types between all combinations of python2.4-python2.6 in both directions. (and soon also 2.7 and 3.1 is added to that mix, possibly Jython and IronPython).

    Releases currently happen rather frequently and depending on the scope of the change, it can go into a 1.0.x release, otherwise 1.1.

    I see some complexity and questions with pickling: For example, do we want:

    channel.send(obj) channel.send(obj)

    to result in the exact same object on the remote side? What happens if there is a remote unpickling error because of a missing import?

    in any case, i'd like to keep the basic remote_exec + send/receive protocol minimal. But we should make it more extensible. What about being able to install a serialize/deserialize module that is called on *both* sides to turn an object into a string and vice versa? the channel data serialization would call into that module - this way one could easily do optimizations, use YAML, drive pickling in special ways etc. Having it in a single module also makes it easy to test interactions and understand its logic. I imagine a gateway.install_serialization(mod) method and maybe a way to specify an execnet-plugin that does this always on gateway instantiation.

  3. cegner

    While a more complete solution is enviable, I don't want to see the perfect become the enemy of the good. Repr and eval are really quite limiting and just about anything would be better. I also would argue that this a minimal increase in the complexity of the protocol, yielding sufficient benefit to be worthwhile until such a redesign can be put in place.

    If you can point me to existing tests, then I can look at what it would take to put in the tests for pickling (are they not in the egg?).

  4. Holger Krekel repo owner

    i'd like to avoid introducing pickle transparently. on a lower layer. what about having remote_exec and newchannel accept a serialize argument which specifies bytes/repr/pickle/whatever for serialization?

    tests are in py/execnet/testing in a checkout and "py.test py/execnet --verbose" runs them.

  5. Ronny Pfannschmidt

    a simple solution to the pickle problem would be to fail on different major pythons versions,

    also it might be desirable to add the pickle.HIGHEST_PROTOCOL constant to rinfo

  6. Log in to comment