I'm currently investigating this. I've started by creating a small microbenchmark PyPy vs CPython (106_02_benchmark.zip). The benchmark varies the number of processes for a certain task. From the graphic (106_02_benchmark.png), it can be seen that PyPy is in average 3.5 times slower than CPython.
I made some small progress, in the sense that I was able to reproduce the slowdown with just 1 process. The plot show that this happens with and without the JIT.
After some profiling, the slope you see in the plots, are appearing because PyPy freezes for small periods of time, in a sem_wait() call (the one from RPyThreadAcquireLockTimed(...) from rpython/translator/c/src/thread_pthread.c).
Do you have any idea how can I see what happens more in depth in PyPy backend when using multiprocessing? It will help me to spot the place where it hangs.
From my investigation one of the problem with multiprocessing is from send() method from _multiprocessing.interp_connection.py. The pickle module call consume about 96% of the time.
If you are saying that the pickle module consumes 96% of the time, then this pull request aims at reducing the time spent in the remaining 4%. That seems... entirely pointless? Unless I'm missing something.