Issue #5 on hold

shmarray to pipe

Matěj Týč
created an issue

Hello, the shmarray.py mostly works great.

The problem arises when one shares an array and then changes it in one process (like when you realize you need a bigger one). Then the other processes are stuck with the old array.

I think that I need to pass the new pointer to the array to all of the dependent processes, but I don't know how. So I have tried to pass the array through the pipe and I got a pickling error you probably know about since the exactly same error is produced in one of the testcases.

So what do you see as a solution to this issue? Does it have a solution? :-)

I have found a discussion on this topic here, but I couldn't make much of it: http://www.internetcomputerforum.com/python-forum/379659-multiprocessing-shared-memory-vs-pickled-copies-2.html

A guy who seems to be knowledgable of this topic is complaining about shmarray.py, so maybe his suggestions could be useful?

Comments (4)

  1. Chris Lee-Messer repo owner

    Thank you for pointing this out. I published this bitbucket repo as an effort to package other's code using distutils and to get started on a test-suite. I made it public in case it might be useful to others. However, I use it myself in a very limited context.

    The test suite makes it clear that something is wrong with pickling. And it is good to know that Sturla is aware---I will look to see if it was broken before I added it to the repo or while it was here.

    The correct implementation would be welcome.

    Here is Sturlamolden's comment

    On 4 apr, 22:20, John Ladasky <lada...@my-deja.com> wrote:

    That is hilarious :-)

    I see that the bitbucket page has my and Gaëls name on it, but the code is changed and broken beyond repair! I don't want to be associated with that crap!

    Their "shmarray.py" will not work -- ever. It fails in two ways:

    1. multiprocessing.Array cannot be pickled (as you noticed). It is shared by handle inheritance. Thus we (that is Gaël and I) made a shmem buffer object that could be pickled by giving it a name in the file system, instead of sharing it anonymously by inheriting the handle. Obviously those behind the bitbucket page don't understand the difference between named and anonymous shared memory (that is, System V IPC and BSD mmap, respectively.)

    2. By subclassing numpy.ndarray a pickle dump would encode a copy of the buffer. But that is what we want to avoid! We want to share the buffer itself, not make a copy of it! So we changed how numpy pickles arrays pointing to shared memory, instead of subclassing ndarray. I did that by slightly modifying some code written by Robert Kern.

    http://folk.uio.no/sturlamo/python/s...feb13-2009.zip Known issues/bugs: 64-bit support is lacking, and os._exit in multiprocessing causes a memory leak on Linux.

  2. Chris Lee-Messer repo owner

    By the way, shmarray.py is David Baddeley's code, not Sturla's. It uses ctypes and multiprocessing as its basis. If you'd like to use the pipe ability, you may want to try they other implementation in this package, "sharedmem". Sharedmem has Sturla's code in it and was designed to be pickled. I will update the README to make this clearer.

    If you look in tests/test_common.py, there is an example of using a pickled shared array with two subprocesses.

    import sharedmem, os, time
    import numpy as np
    import multiprocessing,  pickle
    
    def test_two_subprocesses_with_pickle():
    
        shape = (4,)
        a = sharedmem.zeros(shape, dtype='float64')
        a = sharedmem.zeros(shape)
        print os.getpid(),":", a
        pa = pickle.dumps(a)
    
        lck = multiprocessing.Lock()
    
        def modify_array(pa,lck):
            a = pickle.loads(pa)
            with lck:
                a[0] = 1
                a[1] = 2
                a[2] = 3
    
            print os.getpid(), "modified array"
            
        p = multiprocessing.Process(target=modify_array, args=(pa,lck))
        p.start()
    
        t0 = time.time()
        t1 = t0+10
        nn = 0
        while True:
            if a[0]:
                with lck:
                    t = (a == np.array([1,2,3,0], dtype='float64'))
                break
            if time.time() > t1 : # use timeout instead
                break
            nn += 1
            
        print os.getpid(), t, "nn:", nn
        assert t.all()
        print "finished (from %s)" % os.getpid()
        
        p.join()
        
        print a
    
  3. Log in to comment