Inconsistent results on multiple runs

Issue #480 invalid
Paul Michali
created an issue

Using a script to run coverage on baseline and then changeset as part of OpenStack tests for commits. The script cleans .pyc files and then runs coverage, checks out the newer code and does the same process, so can compare results. However, see different coverage results on each run.

Even when there are no code changes, the coverage results are different, on different runs.

Here is the OpenStack code review code that runs the coverage checks (hopefully you have access to this opensource code):

https://review.openstack.org/#/c/282538/5/networking_cisco/tests/ci/cover.bash

Even in the test run for this commit (which is only the coverage scripting), it shows the project coverage with minor differences between baseline and changeset (no code changes). Some files report more or less coverage.

Here is a link that a co-worker did of running this script multiple times with different changes in the project repo:

http://pastebin.com/cBcn3bX8

Let me know if you need the script or other information. It's preventing us from using the coverage tool upstream, so please help!

Comments (8)

  1. Florian Bruhin

    I have a similar issue in my project, but there it's because the code paths actually changes indeterministically (there's a workaround for a bug which sometimes triggers, and if it's triggered, the coverage changes)

  2. Paul Michali reporter

    In my case, it is the same git repo, where coverage is run on two commits. What I see, is that even with no source code changes, the coverage results differ. Usually just a few lines more or less missing.

  3. Paul Michali reporter

    I manually ran coverage on the same commit two times, and the results were the same. I then checked out the previous version in the repo I had, which only had changes to bash scripts and not python source files (no changes to how the coverage job was run), and the coverage shows 2 lines better coverage (for 88% vs 87%). I removed .pyc files before each coverage run.

  4. Ned Batchelder repo owner

    @Paul Michali I'd like to reproduce your scenario, but I don't see (or maybe just don't understand) the details here that would let me do it. Can you give me very specific steps so that I can recreate your environment and run these tests myself? I'm not familiar with OpenStack or gerrit, so explain it like I'm five :)

  5. Paul Michali reporter

    Sure... hopefully you'll have access (it's open source), and can perform the steps to pull the repo along with the commit that is adding the coverage test that I'm doing. The repo has basic coverage test, in the commit I added a script that was modified from another project that does two coverage runs, one on the current change set and one on the previous (baseline) commit. In this case, the only difference is the script, so the source code should be the same and give the same result. This is assuming Linux system (I use Ubuntu 14.04 64bit server - in a VM or bare metal).

    The Gerrit review page for this change is - https://review.openstack.org/#/c/282538/5

    On that page, at the top right, there is a download link, where you can download the repo with the commit. With that you can run the (two pass) coverage test with "tox -e cover". There is a tox.ini file that has the coverage target commands.

    Let me know if that is sufficient info, if not, I can provide steps to pull the repo, and the coverage commands used from the script, if that is easier (I'm assuming you are familiar with git, and hoping you are familiar with tox).

    Feel free to email me as well. Thanks!

  6. Ned Batchelder repo owner

    @Paul Michali thanks. BTW, you should understand how specialized your OpenStack world is. The tests won't run on a .tgz download (because pbr or something wants to use git to get the version). The git commands in the upper right don't work until you've cloned the repo.

    I ran the tests, and see the behavior you are seeing. I fiddled with things to be able to see exactly which lines are different between the CURRENT and BASELINE runs. The results were surprising: dozens and dozens of lines were different. But some of the different lines made no sense, like reporting a line in the middle of a docstring as executed. On a hunch, I looked at your pip freeze output, and noticed greenlet in the list. I re-ran the tests with concurrency = greenlet in the coveragerc file. Now the final coverage output was the same between the two runs. There were still three lines different, not sure why.

    Try your code with concurrency = greenlet, and re-open this issue if it still seems to be a problem.

  7. Log in to comment