Using 'coverage run' results in different behaviour of code

Issue #463 closed
xaedes created an issue

Executing the code with coverage run -m pytest will give a different result than just running it without coverage.

I have some pygame code that I want to test with py.test. I originally thought the bug to be caused by py.test, but it is actually caused by coverage.

minimal failing example

Running the test code with "./" will pass.
Running the test code with "coverage run ./" will NOT pass.

The code draws an antialiased line using pygame. This is tested against an reference image 'image.png'.

It should look like 'image.png'. But using "coverage .." will result in 'image_test.png'

System information:

python --version

Python 2.7.11

On my other machine with Python 2.7.6 the bug is not reproducable. On yet another machine with Python 2.7.10 the bug is also not reproducable.

coverage version on all three machines the same:


some more info about the system where the bug occurs:

lsb_release -a

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu Xenial Xerus (development branch)
Release:        16.04
Codename:       xenial

uname -a

Linux xaedes-vbox 4.3.0-2-generic #11-Ubuntu SMP Fri Dec 4 20:37:48 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

py.test --version

This is pytest version 2.8.5, imported from /usr/local/lib/python2.7/dist-packages/pytest.pyc
setuptools registered plugins:
  pytest-cov-2.2.0 at /usr/local/lib/python2.7/dist-packages/pytest_cov/plugin.pyc
  pytest-mock-0.9.0 at /usr/local/lib/python2.7/dist-packages/pytest_mock.pyc

python: pygame.version.ver


Comments (17)

  1. Ned Batchelder repo owner

    I have not yet managed to install pygame yet (grrrr, pygame...). My best guess is that somehow different drawing libraries are being used, perhaps because the path is different? Can you add this to the end of your program, and show the results for the good and bad runs?

    import pprint, sys
  2. xaedes reporter

    I added a script that fails, using the coverage API. Then I ran both with python trace module:

    python -m trace -t ./ 1> trace_passed.txt
    python -m trace -t ./ 1> trace_not_passed.txt

    The output confuses me:

    $ cat trace_passed.txt | grep pygame | wc -l
    $ cat trace_not_passed.txt | grep pygame | wc -l
  3. Ned Batchelder repo owner

    I'm not sure where to go with this. Looking at the new modules imported, most are due to, but I can't explain why "email" is in the list of new modules.

    I would next try importing all the new modules from the module diff at the top of your program, and running it without coverage, and see if that is a good or bad run.

  4. xaedes reporter

    Ok, I added imports that where imported from the not passing test (that used coverage).

    It doesn't make it reproduce the error.

    Interestingly, the email related imports didn't work: When comparing the output of sys.modules to not_passed sys.modules I get:

    I stripped the module names and try/except each to import. I am not sure if this is the best way, maybe you can suggest a better one?

    Here some stats how good my approach to import them was:

    $ diff not_passed.txt additional_imports.txt  | wc -l
    $ diff passed.txt additional_imports.txt  | wc -l
    $ diff passed.txt not_passed.txt | wc -l
  5. Ned Batchelder repo owner

    I'm still not sure what is causing this, but a friend reproduced it. He also says that using the Python tracer rather than the C tracer avoided the problem. You can try for yourself by using --timid on the coverage run command line.

  6. xaedes reporter

    I have a lot of work to do the next two weeks (deadlines incoming); I would like to look into it after that.

  7. Overdrivr

    I also encountered this issue on a repository under continuous integration and also locally, it happened since a couple of days on this PR, only on windows

    With the --cov flag, one of my tests is failing, and a second test is purely and simply stuck, not returning at all.

    The first test fails because a dictionnary key is suddenly not found. The dictionnary only contains one key 'rx_bytes', while it should contain 7.

        def resetStats(self, averaging_window=100):
            # Construct a dictionnary for holding references to all counters
            self.measurements = {
                "rx_bytes"  : 0, # To store amount of received and sent characters
                "tx_bytes"  : 0,
                "rx_chunks" : 0, # To store amount of chunks of data
                "tx_chunks"  : 0,
                "rx_in_waiting" : 0, # To store current, avg and peak RX queue size
                "rx_in_waiting_avg" : 0,
                "rx_in_waiting_max" : 0
            self.averaging_window = averaging_window
        def stats(self):
            return self.measurements

    Using pytest-timeout, I was able to get the stack trace of why the second test is failing. A call to acquire() a lock never returns (just to clarify, in my tests there is no thread running, I am just manually calling a class method that is usually called from a thread during normal operation). There is an assertion right before the call to acquire to check that the lock is not locked, and the assertion passes just fine.

    Without touching the code, outcome is unpredictable. Sometimes tests will pass sometimes they won't. Removing the --cov flag resolves the issue.

    Let me know if you think of additionnal testing I could perform.

  8. Ned Batchelder repo owner

    @remibgs Please open a new issue for this case. It's extremely unlikely to be the same cause as the original one here, especially since it seems like we won't end up being able to debug the original problem.

  9. Ned Batchelder repo owner

    For one thing, yours is only happening on Windows. The original problem was on Linux. Mostly what they have in common is bizarre and hard to reproduce circumstances. The description is the same, "Results are different under coverage and not," but other than that they have nothing in common. The original problem is a computed result (an image) that comes out different, your problem seems to be deadlocks, or some other thread-related issues.

    The best thing to do here is to try to narrow down the code so we can isolate the problem. It will be difficult but not impossible for me to test on Windows.

  10. Overdrivr

    Ok good points. I will try to see what I can find, I'll open a ticket if I can reproduce the issue with a minimal code

  11. Log in to comment