multiprocessing slower than cpython

Create issue
Issue #1538 open
mike fc created an issue

From (bugs.pypy.org)

Comments (9)

  1. mike fc reporter
    OSX. 4 core i5.  recent 'nightly' version of pypy
    
    Time to run the code for cpython and pypy.
    
              map    multiprocessing.map
    CPython   28.9          12.6
    pypy      28.1          23.9
    
    #-------------------------------------------------------
    # code code code code code
    #-------------------------------------------------------
    import hashlib
    import datetime
    import multiprocessing
    
    def f(i):
        return hashlib.md5("hello %i" % i).hexdigest()
    
    p = multiprocessing.Pool(4)
    
    for mapf in (map, p.map):
        start = datetime.datetime.now()
        mapf(f, xrange(2**23))
        print datetime.datetime.now() - start
    
  2. Armin Rigo
    Attached a modified version with "pass" in f().  The overhead of using
    multiprocessing seems huge in PyPy.
    
  3. mike fc reporter
    Timings for x.py on OSX. 8 core i7. recent nightly pypy. time is in seconds.
    
              map      multiprocessing.map
    cpython   1.01          0.99
    pypy      0.11          4.80
    
  4. Alecsandru Patrascu

    I'm currently investigating this. I've started by creating a small microbenchmark PyPy vs CPython (106_02_benchmark.zip). The benchmark varies the number of processes for a certain task. From the graphic (106_02_benchmark.png), it can be seen that PyPy is in average 3.5 times slower than CPython.

  5. Alecsandru Patrascu

    I made some small progress, in the sense that I was able to reproduce the slowdown with just 1 process. The plot show that this happens with and without the JIT.

    After some profiling, the slope you see in the plots, are appearing because PyPy freezes for small periods of time, in a sem_wait() call (the one from RPyThreadAcquireLockTimed(...) from rpython/translator/c/src/thread_pthread.c).

    Do you have any idea how can I see what happens more in depth in PyPy backend when using multiprocessing? It will help me to spot the place where it hangs.

  6. Radu Codescu

    From my investigation one of the problem with multiprocessing is from send() method from _multiprocessing.interp_connection.py. The pickle module call consume about 96% of the time.

  7. Armin Rigo

    If you are saying that the pickle module consumes 96% of the time, then this pull request aims at reducing the time spent in the remaining 4%. That seems... entirely pointless? Unless I'm missing something.

  8. Log in to comment