Introduce option for dist-testing chunks

Issue #55 resolved
dsaputo created an issue

Using the py.test -n option, the txnodes seem to be ready to receive tests.

The only way I've seen tests distributed to the nodes is if the tests are all in separate files, e.g. test_spam1.py, test_spam2.py, etc. Even then, it distributes unevenly with some nodes never getting any tests.

If there is only one file with multiple tests, e.g. test_spam.py contains test_eggs1(), test_eggs2(), etc., it appears that all tests go to one node.

If there are generated tests (using pytest_generate_tests(metafunc)), all these tests also go to one node.

Is this the way it works, is this a bug, or am I doing something wrong?

Python version 2.6.3

py.test version 1.0.2

Tested on

Mac OS X 10.6.1

Windows XP SP2

Comments (7)

  1. Holger Krekel repo owner

    tests get distributed in chunks of up to 15 tests at once. for this code:

    def pytest_generate_tests(metafunc):
        for i in range(100):
            metafunc.addcall(funcargs={'x': i})
    
    def test_hello(x):
        assert x
    

    and running py.test -n 3 --pastebin=all i get: http://paste.pocoo.org/show/145180/

    which looks ok. I presume this answer resolves your issue, so am closing.

  2. dsaputo reporter

    My tests were no more than 18 tests at a time, so this makes sense now. I'd add the distribution strategy of "chunks of 15" to the documentation.

    It would be great if the chunk strategy could be a parameter, however. My main use of py.test is for I/O bound database testing. I'd love to have a chunk of 1 in these cases, ensuring that I have each long running query running in parallel on a separate process, and not 15 long running queries on a single process, while the rest are idle. Sometimes 15 queries could run for 11 hours if run sequentially for me. In parallel, it might be less than 2.

    I'm just getting into 1.0.2 after using 0.9.2 for over a year and these new developments are very exciting (I'm loving execnet). You've done a great thing here.

  3. dsaputo reporter

    Great!

    Food for thought:

    I'm wondering why chunking is needed at all. I've got various home spun methods for dealing with my I/O bound issues. At first, I broke the work apart into various chunks and fed them into n python threads (using the threading module). The problem is I never got the chunks right and always ended up with idle threads after some time. I settled into the pattern of putting all the work into a queue (using the Queue module), and setting up n python threads to consume the queue. The work starts immediately once the queue is built and the first thread is created (I don't wait for all threads to be established), and all threads are busy until n-1 tasks remain on the queue. This has proven to give me the highest throughput and is actually the easiest to code. Apple has taken the same approach with Grand Central Dispatch, although to a much higher level.

    http://arstechnica.com/apple/reviews/2009/08/mac-os-x-10-6.ars/12

    In any case, py.test (and soon execnet) are tools that I cherish and let me know if there is any way I can help.

  4. Holger Krekel repo owner

    hi again, py-1.2 and the new pytest-xdist plugin implement things in a new manner so that MAXITEMSPERHOST is not needed anymore and tests should more evenly distribute to nodes also in cases with a small number of tests.

    cheers, holger

  5. Log in to comment