Instrumenting of multiprocessing.Pool

Issue #458 invalid
Anthony Sottile
created an issue

I think this might be the same as #117 however I'm not certain:

import multiprocessing


def f(x):
    print("I'm run in multiprocessing {}".format(x))


multiprocessing.Pool(4).map(f, (1, 2, 3, 4))

Runtime:

$ coverage --version
Coverage.py, version 4.0.3.
Documentation at https://coverage.readthedocs.org
$ coverage erase && coverage run --concurrency=multiprocessing test.py && coverage report --show-missing
I'm run in multiprocessing 1
I'm run in multiprocessing 2
I'm run in multiprocessing 3
I'm run in multiprocessing 4
Name      Stmts   Miss  Cover   Missing
---------------------------------------
test.py       4      1    75%   5

@Buck Evan is interested as well so I've tagged him

Comments (13)

  1. Anthony Sottile reporter
    $ lsb_release  -a
    No LSB modules are available.
    Distributor ID: Ubuntu
    Description:    Ubuntu 14.04.3 LTS
    Release:    14.04
    Codename:   trusty
    

    Also with combine:

    $ coverage erase && coverage run --concurrency=multiprocessing test.py && coverage combine && coverage report --show-missing
    I'm run in multiprocessing 1
    I'm run in multiprocessing 2
    I'm run in multiprocessing 3
    I'm run in multiprocessing 4
    Name      Stmts   Miss  Cover   Missing
    ---------------------------------------
    test.py       4      1    75%   5
    
  2. Anthony Sottile reporter

    Seems to happen in all the ones I have access to:

    $ echo -n '2.6 2.7 3.3 3.4 3.5' | xargs -d' ' --replace bash -ec 'rm -rf venv{} .coverage* && virtualenv venv{} -ppython{} >& /dev/null && . venv{}/bin/activate && pip install coverage >& /dev/null && echo "============" && python --version && echo "============" && coverage erase && coverage run --concurrency=multiprocessing test.py && coverage combine && coverage report --show-missing'
    ============
    Python 2.6.9
    ============
    I'm run in multiprocessing 1
    I'm run in multiprocessing 2
    I'm run in multiprocessing 3
    I'm run in multiprocessing 4
    Name      Stmts   Miss  Cover   Missing
    ---------------------------------------
    test.py       4      1    75%   5
    ============
    Python 2.7.6
    ============
    I'm run in multiprocessing 1
    I'm run in multiprocessing 2
    I'm run in multiprocessing 3
    I'm run in multiprocessing 4
    Name      Stmts   Miss  Cover   Missing
    ---------------------------------------
    test.py       4      1    75%   5
    ============
    Python 3.3.6
    ============
    I'm run in multiprocessing 1
    I'm run in multiprocessing 2
    I'm run in multiprocessing 4
    I'm run in multiprocessing 3
    Name      Stmts   Miss  Cover   Missing
    ---------------------------------------
    test.py       4      1    75%   5
    ============
    Python 3.4.3
    ============
    I'm run in multiprocessing 1
    I'm run in multiprocessing 2
    I'm run in multiprocessing 3
    I'm run in multiprocessing 4
    Name      Stmts   Miss  Cover   Missing
    ---------------------------------------
    test.py       4      0   100%   
    ============
    Python 3.5.1
    ============
    I'm run in multiprocessing 1
    I'm run in multiprocessing 2
    I'm run in multiprocessing 3
    I'm run in multiprocessing 4
    Coverage.py warning: No data was collected.
    Name      Stmts   Miss  Cover   Missing
    ---------------------------------------
    test.py       4      1    75%   5
    
  3. Anthony Sottile reporter

    And for completeness, here's pypy and pypy3:

    $ echo -n 'pypy pypy3' | xargs -d' ' --replace bash -ec 'rm -rf venv{} .coverage* && virtualenv venv{} -p{} >& /dev/null && . venv{}/bin/activate && pip install coverage >& /dev/null && echo "============" && python --version && echo "============" && coverage erase && coverage run --concurrency=multiprocessing test.py && coverage combine && coverage report --show-missing'
    ============
    Python 2.7.9 (295ee98b6928, May 31 2015, 07:29:04)
    [PyPy 2.6.0 with GCC 4.8.2]
    ============
    I'm run in multiprocessing 1
    I'm run in multiprocessing 2
    I'm run in multiprocessing 3
    I'm run in multiprocessing 4
    Name      Stmts   Miss  Cover   Missing
    ---------------------------------------
    test.py       4      1    75%   5
    ============
    Python 3.2.5 (b2091e973da6, Oct 19 2014, 18:29:55)
    [PyPy 2.4.0 with GCC 4.6.3]
    ============
    I'm run in multiprocessing 1
    I'm run in multiprocessing 2
    I'm run in multiprocessing 3
    I'm run in multiprocessing 4
    Name                                                                  Stmts   Miss  Cover   Missing
    ---------------------------------------------------------------------------------------------------
    /opt/pypy/pypy-c-jit-linux-x86-64/build/lib-python/3/_weakrefset.py   NoSource: No source for code: '/opt/pypy/pypy-c-jit-linux-x86-64/build/lib-python/3/_weakrefset.py'.
    test.py                                                                   4      1    75%   5
    
  4. Anthony Sottile reporter

    And python3.4 seems to work some of the time:

    asottile@work:/tmp/foo$ rm .coverage* && ./venv3.4/bin/coverage run --concurrency=multiprocessing test.py && ./venv3.4/bin/coverage combine && ./venv3.4/bin/coverage report --show-missing
    I'm run in multiprocessing 1
    I'm run in multiprocessing 2
    I'm run in multiprocessing 3
    I'm run in multiprocessing 4
    Coverage.py warning: Couldn't read data from '/tmp/foo/.coverage.work.7368.098128': CoverageException: Doesn't seem to be a coverage.py data file
    Coverage.py warning: Couldn't read data from '/tmp/foo/.coverage.work.7367.942050': CoverageException: Doesn't seem to be a coverage.py data file
    Coverage.py warning: Couldn't read data from '/tmp/foo/.coverage.work.7369.667242': CoverageException: Doesn't seem to be a coverage.py data file
    Name      Stmts   Miss  Cover   Missing
    ---------------------------------------
    test.py       4      1    75%   5
    asottile@work:/tmp/foo$ rm .coverage* && ./venv3.4/bin/coverage run --concurrency=multiprocessing test.py && ./venv3.4/bin/coverage combine && ./venv3.4/bin/coverage report --show-missing
    I'm run in multiprocessing 1
    I'm run in multiprocessing 2
    I'm run in multiprocessing 4
    I'm run in multiprocessing 3
    Coverage.py warning: Couldn't read data from '/tmp/foo/.coverage.work.7379.993911': CoverageException: Doesn't seem to be a coverage.py data file
    Name      Stmts   Miss  Cover   Missing
    ---------------------------------------
    test.py       4      0   100%   
    asottile@work:/tmp/foo$ rm .coverage* && ./venv3.4/bin/coverage run --concurrency=multiprocessing test.py && ./venv3.4/bin/coverage combine && ./venv3.4/bin/coverage report --show-missing
    I'm run in multiprocessing 1
    I'm run in multiprocessing 2
    I'm run in multiprocessing 4
    I'm run in multiprocessing 3
    Coverage.py warning: No data was collected.
    Coverage.py warning: Couldn't read data from '/tmp/foo/.coverage.work.7389.197842': CoverageException: Doesn't seem to be a coverage.py data file
    Name      Stmts   Miss  Cover   Missing
    ---------------------------------------
    test.py       4      0   100%   
    
  5. Ned Batchelder repo owner

    My existing tests are using multiprocessing.Pool.imap_unordered. Do you know what the difference is?

    The odd thing is that I just merged a fix for multiprocessing on Windows, and the imap_unordered test fails on Python 3.4 and 3.5 on Windows... Clearly I need to understand more about multiprocessing.

  6. Anthony Sottile reporter

    I would hope they're similar but maybe not.

    From what I gather:

    • map: enforces order, exhausts the iterable before returning
    • imap: enforces order, lazily yielding (has additional code for maintaining order and so is slower)
    • imap_unordered: doesn't enforce ordering, lazily yielding

    I would hope they all use the same Process implementation though, I'd have to dig into the code a little bit more

  7. Ned Batchelder repo owner

    OK: the results are accurate (though there are warnings) if you change the code to:

    import multiprocessing
    
    
    def f(x):
        print("I'm run in multiprocessing {}".format(x))
    
    
    pool = multiprocessing.Pool(4)
    pool.map(f, (1, 2, 3, 4))
    
    pool.close()
    pool.join()
    

    The warnings are a little unfortunate:

    $ coverage erase && coverage run --concurrency=multiprocessing bug458.py && coverage combine && coverage report --show-missing
    I'm run in multiprocessing 1
    I'm run in multiprocessing 2
    I'm run in multiprocessing 3
    I'm run in multiprocessing 4
    Coverage.py warning: No data was collected.
    Coverage.py warning: No data was collected.
    Coverage.py warning: No data was collected.
    Name        Stmts   Miss  Cover   Missing
    -----------------------------------------
    bug458.py       7      0   100%
    

    Note you also need "coverage combine"

  8. Ned Batchelder repo owner

    Well, this is an interesting question. The examples on the page show using Pool.map without close or join. It seems simple enough to include the close and join, though I'm unsure exactly why it needs them...

  9. Log in to comment