Race condition when saving data under multiple threads

Issue #581 resolved
Anonymous created an issue

I do not have a minimum repro for this issue yet, but we're seeing errors in our CI after upgrading coverage from 4.3.4 to 4.4b1 and 4.4.

We have some tests running using rostest. Upon upgrading from coverage 4.3.4 on our CI server, we started seeing the following errors:

Traceback (most recent call last):
  File "/tmp/tmpw8DgrF/src/app/test/integration/test_head_service.py", line 86, in <module>
    main()
  File "/tmp/tmpw8DgrF/src/app/test/integration/test_head_service.py", line 82, in main
    rostest.rosrun(PKG, NAME, TestHeadService)
  File "/opt/ros/indigo/lib/python2.7/dist-packages/rostest/__init__.py", line 148, in rosrun
    _stop_coverage([package])
  File "/opt/ros/indigo/lib/python2.7/dist-packages/rostest/__init__.py", line 227, in _stop_coverage
    _cov.stop()
  File "/usr/local/lib/python2.7/dist-packages/coverage/control.py", line 899, in analysis
    f, s, _, m, mf = self.analysis2(morf)
  File "/usr/local/lib/python2.7/dist-packages/coverage/control.py", line 920, in analysis2
    analysis = self._analyze(morf)
  File "/usr/local/lib/python2.7/dist-packages/coverage/control.py", line 935, in _analyze
    self.get_data()
  File "/usr/local/lib/python2.7/dist-packages/coverage/control.py", line 824, in get_data
    if self.collector.save_data(self.data):
  File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 382, in save_data
    covdata.add_lines(abs_file_dict(self.data))
  File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 377, in abs_file_dict
    return dict((abs_file(k), v) for k, v in iitems(d))
  File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 377, in <genexpr>
    return dict((abs_file(k), v) for k, v in iitems(d))

It's possible this is a rostest issue too - I haven't dug down very deeply, but we're certain that this issue does not occur when using coverage 4.3.4 - only 4.4b1 and 4.4.

The source code for the caller is found here. It looks like stop is being called from line 227 and then blowing up internally.

Will post more info once I have it.

Contact info: peter@mayfieldrobotics.com

Comments (36)

  1. Peter Baughman

    Original poster here. It looks like I truncated the stack trace and didn't report the actual error. The final three lines should be:

      File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 377, in <genexpr>
        return dict((abs_file(k), v) for k, v in iitems(d))
    RuntimeError: dictionary changed size during iteration
    
  2. Peter Baughman

    That's not possible - currently this happens in our integration tests which requires a ton of other infrastructure to run. I'm working on a stand-alone test-case that I can share.

  3. Ned Batchelder repo owner

    OK, then we'll have to share clues here. This looks to me like you are measuring coverage inside of coverage.py itself. That shouldn't happen. Try changing line 198 to:

    _cov = coverage.coverage(debug=["config", "sys"])
    

    then capture what appears on stderr, and share it here.

  4. Peter Baughman

    Here you go. I removed some internal product names from the paths, but this is otherwise un-messed-with

    -- config ----------------------------------------------------
     attempted_config_files: .coveragerc
                             setup.cfg
                             tox.ini
                     branch: False
                concurrency: None
               config_files: -none-
                cover_pylib: False
                  data_file: .coverage
                      debug: config
                             sys
           disable_warnings: -none-
               exclude_list: #\s*(pragma|PRAGMA)[:\s]?\s*(no|NO)\s*(cover|COVER)
                  extra_css: None
                 fail_under: 0
                   html_dir: htmlcov
                 html_title: Coverage report
              ignore_errors: False
                    include: None
                       note: None
                       omit: None
                   parallel: False
        partial_always_list: while (True|1|False|0):
                             if (True|1|False|0):
               partial_list: #\s*(pragma|PRAGMA)[:\s]?\s*(no|NO)\s*(branch|BRANCH)
                      paths: {}
             plugin_options: {}
                    plugins: -none-
                  precision: 0
               show_missing: False
               skip_covered: False
                     source: None
                      timid: False
                 xml_output: coverage.xml
          xml_package_depth: 99
    -- sys -------------------------------------------------------
                  version: 4.4
                 coverage: /usr/local/lib/python2.7/dist-packages/coverage/__init__.pyc
              cover_paths: /usr/local/lib/python2.7/dist-packages/coverage
              pylib_paths: /usr/lib/python2.7
                   tracer: CTracer
     plugins.file_tracers: -none-
             config_files: .coveragerc
                           setup.cfg
                           tox.ini
             configs_read: -none-
                data_path: /root/-----/app/test/integration/.coverage
                   python: 2.7.6 (default, Oct 26 2016, 20:30:19) [GCC 4.8.4]
                 platform: Linux-4.4.0-72-generic-x86_64-with-Ubuntu-14.04-trusty
           implementation: CPython
               executable: /usr/bin/python
                      cwd: /root/-----/app/test/integration
                     path: /tmp/tmpgSRcfQ/src/app/src
                           /root/-----/app/test/integration
                           /tmp/tmpgSRcfQ/devel_isolated/lib/python2.7/dist-packages
                           /opt/-----/lib/python2.7/dist-packages
                           /opt/ros/indigo/lib/python2.7/dist-packages
                           /usr/lib/python2.7
                           /usr/lib/python2.7/plat-x86_64-linux-gnu
                           /usr/lib/python2.7/lib-tk
                           /usr/lib/python2.7/lib-old
                           /usr/lib/python2.7/lib-dynload
                           /usr/local/lib/python2.7/dist-packages
                           /usr/local/lib/python2.7/dist-packages/pip-9.0.1-py2.7.egg
                           /usr/lib/python2.7/dist-packages
                           /usr/lib/python2.7/dist-packages/PILcompat
                           /usr/lib/python2.7/dist-packages/gtk-2.0
                           /usr/lib/pymodules/python2.7
              environment: PYTHONPATH = /tmp/tmpgSRcfQ/devel_isolated/lib/python2.7/dist-packages:/opt/-----/lib/python2.7/dist-packages:/opt/ros/indigo/lib/python2.7/dist-packages
             command_line: /tmp/tmpgSRcfQ/src/app/test/integration/test_head_service.py --cov --gtest_output=xml:/tmp/tmpgSRcfQ/build_isolated/-----/test_results/-----/rosunit-test_head_service.xml __name:=test_he
    ad_service __log:=/root/.ros/log/cc97f6b0-3738-11e7-a7d6-0242ac110002/test_head_service-14.log
             source_match: -none-
        source_pkgs_match: -none-
            include_match: -none-
               omit_match: -none-
              cover_match: /usr/local/lib/python2.7/dist-packages/coverage
              pylib_match: /usr/lib/python2.7
    -- end -------------------------------------------------------
    
  5. Ned Batchelder repo owner

    Hmm, thanks. Now add "trace" to that list of options:

    _cov = coverage.coverage(debug=["config", "sys", "trace"])
    

    which will produce more output, one line per file executed.

  6. Peter Baughman

    I removed some lines that say Tracing '/tmp/our-internal-product-stuff/foo'

    Coverage is down at the bottom - it looks like it's being skipped. I do see multiprocessing in the list, though. Pure speculation, but that's one of the things I'm investigating to try to repro this in a smaller way.

    EDIT I didn't include this at first - but the thing blows up right after the line

    Not tracing '/usr/local/lib/python2.7/dist-packages/coverage/collector.py': is part of coverage.py at the end
    

    I've edited the output below to include the stack trace when it blows up

    Not tracing '/usr/lib/python2.7/threading.py': is in the stdlib
    Tracing '/tmp/tmpKoLMLj/src/app/src/-----/__init__.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/registration.py'
    Not tracing '/usr/lib/python2.7/unittest/loader.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/functools.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/unittest/case.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/unittest/suite.py': is in the stdlib
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/masterslave.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rosunit/core.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/names.py'
    Not tracing '/usr/lib/python2.7/posixpath.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/genericpath.py': is in the stdlib
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/topics.py'
    Not tracing '/usr/lib/python2.7/stat.py': is in the stdlib
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rosunit/xmlrunner.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/tcpros_pubsub.py'
    Not tracing '/usr/lib/python2.7/logging/__init__.py': is in the stdlib
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/core.py'
    Not tracing '/usr/lib/python2.7/urlparse.py': is in the stdlib
    Not tracing '<string>': not a real file name
    Not tracing '/usr/lib/python2.7/xmlrpclib.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/urllib.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/string.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/httplib.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/socket.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/unittest/result.py': is in the stdlib
    Tracing '/tmp/tmpKoLMLj/src/app/test/integration/test_joint_states_service.py'
    Tracing '/root/-----/app/test/integration/test_base.py'
    Not tracing '/usr/lib/python2.7/Queue.py': is in the stdlib
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rosgraph/names.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/statistics.py'
    Not tracing '/usr/lib/python2.7/mimetools.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/rfc822.py': is in the stdlib
    Not tracing '/build/python2.7-t6d6Zr/python2.7-2.7.6/Modules/pyexpat.c': pyexpat lies about itself
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/tcpros_base.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/transport.py'
    Not tracing '/usr/lib/python2.7/multiprocessing/process.py': is in the stdlib
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/client.py'
    Not tracing '/usr/lib/python2.7/logging/handlers.py': is in the stdlib
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/msproxy.py'
    Not tracing '/usr/lib/python2.7/UserDict.py': is in the stdlib
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rosgraph/network.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rosgraph/rosenv.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/tf2_ros/buffer.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/tf2_ros/buffer_interface.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rosgraph/masterapi.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/msg.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rosgraph_msgs/msg/_Clock.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/genpy/rostime.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/simtime.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/service.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/rostime.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/tf2_ros/transform_listener.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/actionlib/action_client.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/control_msgs/msg/_FollowJointTrajectoryAction.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/control_msgs/msg/_FollowJointTrajectoryActionGoal.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/std_msgs/msg/_Header.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/actionlib_msgs/msg/_GoalID.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/control_msgs/msg/_FollowJointTrajectoryGoal.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/trajectory_msgs/msg/_JointTrajectory.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/control_msgs/msg/_FollowJointTrajectoryActionResult.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/actionlib_msgs/msg/_GoalStatus.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/control_msgs/msg/_FollowJointTrajectoryResult.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/control_msgs/msg/_FollowJointTrajectoryActionFeedback.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/control_msgs/msg/_FollowJointTrajectoryFeedback.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/trajectory_msgs/msg/_JointTrajectoryPoint.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/tf2_msgs/msg/_TFMessage.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/geometry_msgs/msg/_TransformStamped.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/geometry_msgs/msg/_Transform.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/geometry_msgs/msg/_Vector3.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/geometry_msgs/msg/_Quaternion.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/sensor_msgs/msg/_JointState.py'
    Not tracing '/usr/lib/python2.7/abc.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/_weakrefset.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/_abcoll.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/SocketServer.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/BaseHTTPServer.py': is in the stdlib
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/geometry_msgs/msg/_Pose.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/geometry_msgs/msg/_Point.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/geometry_msgs/msg/_Pose2D.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/cv_bridge/core.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/sensor_msgs/msg/_CameraInfo.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/sensor_msgs/msg/_RegionOfInterest.py'
    Tracing '/tmp/tmpKoLMLj/src/app/src/app_bringup/params.py'
    Tracing '/usr/local/lib/python2.7/dist-packages/yaml/__init__.py'
    Tracing '/usr/local/lib/python2.7/dist-packages/yaml/loader.py'
    Tracing '/usr/local/lib/python2.7/dist-packages/yaml/reader.py'
    Tracing '/usr/local/lib/python2.7/dist-packages/yaml/scanner.py'
    Tracing '/usr/local/lib/python2.7/dist-packages/yaml/error.py'
    Tracing '/usr/local/lib/python2.7/dist-packages/yaml/tokens.py'
    Tracing '/usr/local/lib/python2.7/dist-packages/yaml/parser.py'
    Tracing '/usr/local/lib/python2.7/dist-packages/yaml/composer.py'
    Tracing '/usr/local/lib/python2.7/dist-packages/yaml/constructor.py'
    Tracing '/usr/local/lib/python2.7/dist-packages/yaml/resolver.py'
    Tracing '/usr/local/lib/python2.7/dist-packages/yaml/events.py'
    Tracing '/usr/local/lib/python2.7/dist-packages/yaml/nodes.py'
    Not tracing '/usr/lib/python2.7/SimpleXMLRPCServer.py': is in the stdlib
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/validators.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rosgraph/roslogging.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/impl/rosout.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rosgraph_msgs/msg/_Log.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/genpy/message.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslib/packages.py'
    Not tracing '/usr/lib/python2.7/dist-packages/rospkg/rospack.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/dist-packages/rospkg/environment.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/os.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/xml/etree/ElementTree.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/xml/etree/ElementPath.py': is in the stdlib
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/catkin/find_in_workspaces.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/catkin/workspace.py'
    Not tracing '/usr/lib/python2.7/dist-packages/catkin_pkg/packages.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/dist-packages/catkin_pkg/package.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/xml/dom/minidom.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/xml/dom/expatbuilder.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/StringIO.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/copy.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/copy_reg.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/re.py': is in the stdlib
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/rlutil.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/parent.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/config.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/core.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/xmlloader.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/loader.py'
    Not tracing '/usr/lib/python2.7/shlex.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/subprocess.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/platform.py': is in the stdlib
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/pmon.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/server.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rosgraph/xmlrpc.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/launch.py'
    Not tracing '/usr/lib/python2.7/gzip.py': is in the stdlib
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/nodeprocess.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/node_args.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/substitution_args.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/roslib/__init__.py'
    Not tracing '/usr/lib/python2.7/atexit.py': is in the stdlib
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/std_msgs/msg/_String.py'
    Not tracing '/usr/lib/python2.7/shutil.py': is in the stdlib
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/std_msgs/msg/_Empty.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/geometry_msgs/msg/_Twist.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/actionlib/simple_action_server.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/actionlib/server_goal_handle.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/actionlib/action_server.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/actionlib/goal_id_generator.py'
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/actionlib_msgs/msg/_GoalStatusArray.py'
    Not tracing '/usr/local/lib/python2.7/dist-packages/coverage/control.py': is part of coverage.py
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rospy/timer.py'
    Not tracing '/usr/lib/python2.7/traceback.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/linecache.py': is in the stdlib
    Not tracing '/usr/lib/python2.7/unittest/util.py': is in the stdlib
    Tracing '/opt/ros/indigo/lib/python2.7/dist-packages/rostest/__init__.py'
    Not tracing '/usr/local/lib/python2.7/dist-packages/coverage/collector.py': is part of coverage.py
    Traceback (most recent call last):
      File "/tmp/tmpE4nq9m/src/app/test/integration/test_bagfile_recorder.py", line 83, in <module>
        main()
      File "/tmp/tmpE4nq9m/src/app/test/integration/test_bagfile_recorder.py", line 79, in main
        rostest.rosrun(PKG, NAME, TestBagfileRecorder)
      File "/opt/ros/indigo/lib/python2.7/dist-packages/rostest/__init__.py", line 148, in rosrun
        _stop_coverage([package])
      File "/opt/ros/indigo/lib/python2.7/dist-packages/rostest/__init__.py", line 228, in _stop_coverage
        _cov.stop()
      File "/usr/local/lib/python2.7/dist-packages/coverage/control.py", line 919, in analysis
        f, s, _, m, mf = self.analysis2(morf)
      File "/usr/local/lib/python2.7/dist-packages/coverage/control.py", line 940, in analysis2
        analysis = self._analyze(morf)
      File "/usr/local/lib/python2.7/dist-packages/coverage/control.py", line 955, in _analyze
        self.get_data()
      File "/usr/local/lib/python2.7/dist-packages/coverage/control.py", line 824, in get_data
        if self.collector.save_data(self.data):
      File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 382, in save_data
        covdata.add_lines(abs_file_dict(self.data))
      File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 377, in abs_file_dict
        return dict((abs_file(k), v) for k, v in iitems(d))
      File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 377, in <genexpr>
        return dict((abs_file(k), v) for k, v in iitems(d))
    RuntimeError: dictionary changed size during iteration
    
  7. Peter Baughman

    I'm taking my repro-case and cutting it down binary search style. I'm noticing that as I remove more stuff, the likelyhood that coverage blows up is going down. With everything in there, I can get it to blow up every time. With about half of the stuff removed, it's now to the point where it'll blow up about 1 in 3 times.

  8. Ned Batchelder repo owner

    Now I'm thinking of something drastic, like copying the dict just before that line, then catching RuntimeError and checking the difference between the original dict and the current dict.

  9. Peter Baughman

    Ok, I modified save_data to be the following:

        def save_data(self, covdata):
            """Save the collected data to a `CoverageData`.
    
            Returns True if there was data to save, False if not.
            """
            if not self._activity():
                return False
    
            def abs_file_dict(d):
                """Return a dict like d, but with keys modified by `abs_file`."""
                copy_dict = d.copy()
                try:
                    return dict((abs_file(k), v) for k, v in iitems(d))
                except RuntimeError:
                    new_dict = d.copy()
                    added = set(new_dict.keys()) - set(copy_dict.keys())
                    removed = set(copy_dict.keys()) - set(new_dict.keys())
                    print >> sys.stderr, "**** HERE COMES THE INFO  ****"
                    print >> sys.stderr, "Things that were added: {}".format(added)
                    print >> sys.stderr, "Things that were removed: {}".format(removed)
                    print >> sys.stderr, "**** END OF THE INFO ****"
                    raise
    

    And here's what I see (with a little bit of manual formatting added):

    Things that were added: set([
        u'/opt/ros/indigo/lib/python2.7/dist-packages/geometry_msgs/msg/_Vector3.py',
        u'/opt/ros/indigo/lib/python2.7/dist-packages/geometry_msgs/msg/_Transform.py',
        u'/opt/ros/indigo/lib/python2.7/dist-packages/tf2_ros/transform_listener.py',
        u'/opt/ros/indigo/lib/python2.7/dist-packages/tf2_msgs/msg/_TFMessage.py',
        u'/opt/ros/indigo/lib/python2.7/dist-packages/geometry_msgs/msg/_TransformStamped.py',
        u'/opt/ros/indigo/lib/python2.7/dist-packages/geometry_msgs/msg/_Quaternion.py'])
    
    Things that were removed: set([])
    

    These all seem like legit files that might be getting coverage generated for them. There's also approx 22 python threads hanging around, most of them waiting to publish or receive something on a socket.

  10. Peter Baughman

    I have a sort of stand-alone repro case that I need to clean up a little bit, but here's the gist:

    1. Start coverage
    2. Spin up a bunch of python threads (20, in my case). Each thread imports a module, and then imports a class from that module
    3. stop coverage
    4. save coverage
    5. Observe the "dictionary changed size during iteration" exception that comes out of collector.py lin 377.

    It's not 100% standalone yet, because I'm still using ROS to get a list of modules and classes to import. I have a really long list of ROS objects of the form: ["module name", "class name"]

    for example:

    importlist = [
    ["std_msgs.msg", "Bool",],
    ["std_msgs.msg", "Byte",],
    ["std_msgs.msg", "ByteMultiArray",],
    ["std_msgs.msg", "Char",],
    ["std_msgs.msg", "ColorRGBA",],
    ["std_msgs.msg", "Duration",],
    ["std_msgs.msg", "Empty",],
    ["std_msgs.msg", "Float32",],
    ["std_msgs.msg", "Float32MultiArray",],
    ["std_msgs.msg", "Float64",],
    ["std_msgs.msg", "Float64MultiArray",],
    ]
    

    Then my threads all run the following method:

    def random_load():
        global _run
        global importlist
        while _run:
            candidate = random.choice(importlist)
            mod = importlib.import_module(candidate[0])
            cls = getattr(mod, candidate[1])
    

    I don't think this is ROS specific - but using the ROS messages was a convenient way to quickly generate a very large list of modules and class names to import.

    I think I can give you instructions to fully repro the issue using a docker container that has the ROS stuff in it. Again, I don't believe that ROS is necessary - it's just a convenient way for me to generate a large list of things to import.

    Here is the program I'm using right now to reproduce the problem without the importlist:

    _cov = None
    _run = True
    
    def _start_coverage():
        global _cov
        import coverage
        _cov = coverage.coverage()
        _cov.start()
    
    def _stop_coverage():
        _cov.stop()
        print "YOU WILL NEVER GET HERE"
        # Mega strange.  When the program blows up, it reports that _cov.stop() blows up.
        # The thing is, I see "YOU WILL NEVER GET HERE" printed to the console.  Also the
        # program doesn't blow up without _cov.save()
        _cov.save()
    
    def random_load():
        global _run
        global importlist
        while _run:
            candidate = random.choice(importlist)
            mod = importlib.import_module(candidate[0])
            # cls = getattr(mod, candidate[1]) # Not needed for repro
    
    _start_coverage()
    for n in range(20):
        tr = threading.Thread(target=random_load)
        tr.start()
    time.sleep(0.1)
    _stop_coverage()
    _run = False
    

    Here's the stack trace that I see:

    peter@docker-dev:~/cov_repro$ python repro.py 
    YOU WILL NEVER GET HERE
    Traceback (most recent call last):
      File "repro.py", line 440, in <module>
        _stop_coverage()
      File "repro.py", line 422, in _stop_coverage
        _cov.stop()
      File "/usr/local/lib/python2.7/dist-packages/coverage/control.py", line 772, in save
        self.get_data()
      File "/usr/local/lib/python2.7/dist-packages/coverage/control.py", line 824, in get_data
        if self.collector.save_data(self.data):
      File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 382, in save_data
        covdata.add_lines(abs_file_dict(self.data))
      File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 377, in abs_file_dict
        return dict((abs_file(k), v) for k, v in iitems(d))
      File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 377, in <genexpr>
        return dict((abs_file(k), v) for k, v in iitems(d))
    RuntimeError: dictionary changed size during iteration
    
  11. Peter Baughman

    repro.py is attached to this issue. Steps to reproduce using a docker container are below. The ros-indigo-ros-base docker image is used because it contains already all of the classes that are imported to repro the issue. I believe it is possible to reproduce this issue without ROS, too, but do not have a good way of generating a big list of imports without ROS being installed:

    Make the directory ~/cov_repro and put the file repro.py in it
    > docker run -it --rm -v ~/cov_repro:/root/cov_repro ros:indigo-ros-base /bin/bash
    ----- You are now inside a docker container -----
    > easy_install pip
    > pip install coverage==4.4
    > cd /root/cov_repro
    > python repro.py
    

    Observe the following output:

    YOU WILL NEVER GET HERE
    Traceback (most recent call last):
      File "repro.py", line 201, in <module>
        _stop_coverage()
      File "repro.py", line 182, in _stop_coverage
        _cov.stop()
      File "/usr/local/lib/python2.7/dist-packages/coverage/control.py", line 772, in save
        self.get_data()
      File "/usr/local/lib/python2.7/dist-packages/coverage/control.py", line 824, in get_data
        if self.collector.save_data(self.data):
      File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 382, in save_data
        covdata.add_lines(abs_file_dict(self.data))
      File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 377, in abs_file_dict
        return dict((abs_file(k), v) for k, v in iitems(d))
      File "/usr/local/lib/python2.7/dist-packages/coverage/collector.py", line 377, in <genexpr>
        return dict((abs_file(k), v) for k, v in iitems(d))
    RuntimeError: dictionary changed size during iteration
    

    Notes:

    1. You will probably have to terminate the program with ctrl+\ because the threads won't be shut down.
    2. Commenting out the line _cov.save() will prevent the problem from happening, despite the stack trace saying that the issue comes from inside _cov.stop
  12. Ned Batchelder repo owner

    @Peter Baughman Thanks so much for the test case, I am able to reproduce it. And it's nothing to do with your code. I replaced your imports with some synthetic ones, and could still cause the failure. This program does it without needing Docker, etc:

    import importlib
    import random
    import threading
    import time
    
    _cov = None
    _run = True
    
    for i in xrange(1000):
        with open("m{:04d}.py".format(i), "w") as f:
            f.write("a = 1\n")
    
    def _start_coverage():
        global _cov
        import coverage
        _cov = coverage.coverage()
        _cov.start()
    
    def _stop_coverage():
        _cov.stop()
        print "YOU WILL NEVER GET HERE!!"
        # Mega strange.  When the program blows up, it reports that _cov.stop() blows up.
        # The thing is, I see "YOU WILL NEVER GET HERE" printed to the console.  Also the
        # program doesn't blow up without _cov.save()
        _cov.save()
    
    def random_load():
        global _run
        while _run:
            candidate = "m{:04d}".format(random.randint(0, 999))
            mod = importlib.import_module(candidate)
    
    _start_coverage()
    for n in range(20):
        threading.Thread(target=random_load).start()
    time.sleep(0.1)
    _stop_coverage()
    _run = False
    

    Now to work on fixing the problem.... that might be harder...

  13. Peter Baughman

    A somewhat cheesy solution that does not address a possible underlying logic issue might be to make a copy of the data in collector.py save_data like:

            dat = self.data.copy()
            if self.branch:
                covdata.add_arcs(abs_file_dict(dat))
            else:
                covdata.add_lines(abs_file_dict(dat))
            covdata.add_file_tracers(abs_file_dict(dat))
    

    I didn't deep-dive, but it seems strange that things are still being added to the dictionary after stop is called

  14. Ned Batchelder repo owner

    @Peter Baughman I did some work to investigate fixes, but got sidetracked. Copying the data dictionary makes the problem less likely, but does not fix it robustly. I'm sorry to say this, but for the time being, you should run with your fix from a fork of coverage.py.

  15. Olivier Grisel

    I can reproduce a similar issue on the CI infrastructure of scikit-learn (but did no observe this on my local machine):

    https://s3.amazonaws.com/archive.travis-ci.org/jobs/260611942/log.txt?X-Amz-Expires=30&X-Amz-Date=20170803T154403Z&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAJRYRXRSVGNKPKO5A/20170803/us-east-1/s3/aws4_request&X-Amz-SignedHeaders=host&X-Amz-Signature=094dbeee4fd45df1666ecf7df10abf707743864ba1b69821b42cd93b69e94e29

    Traceback (most recent call last):
      File "/home/travis/miniconda/envs/testenv/bin/nosetests", line 6, in <module>
        sys.exit(nose.run_exit())
      File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose/core.py", line 121, in __init__
        **extra_args)
      File "/home/travis/miniconda/envs/testenv/lib/python2.7/unittest/main.py", line 95, in __init__
        self.runTests()
      File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose/core.py", line 207, in runTests
        result = self.testRunner.run(self.test)
      File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose/core.py", line 66, in run
        result.printErrors()
      File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose/result.py", line 110, in printErrors
        self.config.plugins.report(self.stream)
      File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose/plugins/manager.py", line 99, in __call__
        return self.call(*arg, **kw)
      File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose/plugins/manager.py", line 167, in simple
        result = meth(*arg, **kw)
      File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose/plugins/cover.py", line 183, in report
        self.coverInstance.stop()
      File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/coverage/control.py", line 1086, in html_report
        return reporter.report(morfs)
      File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/coverage/html.py", line 139, in report
        self.report_files(self.html_file, morfs, self.config.html_dir)
      File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/coverage/report.py", line 91, in report_files
        report_fn(fr, self.coverage._analyze(fr))
      File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/coverage/control.py", line 955, in _analyze
        self.get_data()
      File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/coverage/control.py", line 824, in get_data
        if self.collector.save_data(self.data):
      File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/coverage/collector.py", line 382, in save_data
        covdata.add_lines(abs_file_dict(self.data))
      File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/coverage/collector.py", line 377, in abs_file_dict
        return dict((abs_file(k), v) for k, v in iitems(d))
      File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/coverage/collector.py", line 377, in <genexpr>
        return dict((abs_file(k), v) for k, v in iitems(d))
    RuntimeError: dictionary changed size during iteration
    
  16. Olivier Grisel

    I updated the minimal repro script to work in Python 3 with the print function and could therefore enable flush=True and I can see the print function output on stdout:

    from __future__ import print_function
    import importlib
    import random
    import threading
    import time
    
    _cov = None
    _run = True
    
    for i in range(1000):
        with open("m{:04d}.py".format(i), "w") as f:
            f.write("a = 1\n")
    
    def _start_coverage():
        global _cov
        import coverage
        _cov = coverage.coverage()
        _cov.start()
    
    def _stop_coverage():
        _cov.stop()
        print("YOU WILL NEVER GET HERE!!", flush=True)
        # Mega strange.  When the program blows up, it reports that _cov.stop() blows up.
        # The thing is, I see "YOU WILL NEVER GET HERE" printed to the console.  Also the
        # program doesn't blow up without _cov.save()
        _cov.save()
    
    def random_load():
        global _run
        while _run:
            candidate = "m{:04d}".format(random.randint(0, 999))
            mod = importlib.import_module(candidate)
    
    _start_coverage()
    for n in range(20):
        threading.Thread(target=random_load).start()
    time.sleep(0.1)
    _stop_coverage()
    _run = False
    

    I get the following output:

    YOU WILL NEVER GET HERE!!
    Traceback (most recent call last):
      File "coverage_bug.py", line 38, in <module>
        _stop_coverage()
      File "coverage_bug.py", line 21, in _stop_coverage
        _cov.stop()
      File "/volatile/ogrisel/code/coveragepy/coverage/control.py", line 772, in save
        self.get_data()
      File "/volatile/ogrisel/code/coveragepy/coverage/control.py", line 824, in get_data
        if self.collector.save_data(self.data):
      File "/volatile/ogrisel/code/coveragepy/coverage/collector.py", line 382, in save_data
        covdata.add_lines(abs_file_dict(self.data))
      File "/volatile/ogrisel/code/coveragepy/coverage/collector.py", line 377, in abs_file_dict
        return dict((abs_file(k), v) for k, v in iitems(d))
      File "/volatile/ogrisel/code/coveragepy/coverage/collector.py", line 377, in <genexpr>
        return dict((abs_file(k), v) for k, v in iitems(d))
    RuntimeError: dictionary changed size during iteration
    

    and then the program is waiting indefinitely (since the _run flag is never set to False).

  17. Olivier Grisel

    The following patch fixes the bug for me:

    diff -r 12d026f6db21 coverage/collector.py
    --- a/coverage/collector.py Sat Jul 08 11:03:31 2017 -0400
    +++ b/coverage/collector.py Tue Aug 08 16:25:10 2017 +0200
    @@ -374,7 +374,7 @@
    
             def abs_file_dict(d):
                 """Return a dict like d, but with keys modified by `abs_file`."""
    -            return dict((abs_file(k), v) for k, v in iitems(d))
    +            return dict((abs_file(k), v) for k, v in list(iitems(d)))
    
             if self.branch:
                 covdata.add_arcs(abs_file_dict(self.data))
    

    This makes the iteration thread-safe. But maybe writing into covdata is not thread safe?

  18. Olivier Grisel

    Apparently there is only one thread (the main thread) calling into save_data while the other thread is concurrently mutating the self.data dict. I therefore thing the proposed fixed is correct.

  19. Ned Batchelder repo owner

    I will have to get back to this soon. I am not just adding a list() call for a few reasons: first, it doesn't solve the race condition, it just narrows the window from the time to execute the entire line to the time to execute the list(). Adding a lock around this line might be a solution, but I think there's also technically a race condition in the trace function where we add keys to the data dictionary in the first place. That is an even narrower window for problems, but at least at an academic level, is still a problem. Locking on every line recorded doesn't seem feasible.

    At the very least, I'd like to understand the scope of the problem before adding this simple pragmatic 99% solution.

  20. Olivier Grisel

    Alright. Just a remark: list is a builtin type in Python so I believe that the GIL will protect against the dictionary iteration concurrency issue although I am not 100% sure.

  21. Olivier Grisel

    <del>I think the fundamental problem comes from the fact that threading.settrace can only be used to enable tracing on future threads and cannot be used to enable or disable tracing on running threads.</del> Actually I am reading the source code of the PyTracer class and the stopped flag should be able to deal with this constraint.

  22. Ned Batchelder repo owner

    The thread-safety of list() is something to consider. I don't understand your point about past vs future threads? I thought the problem here was one thread recording data into the shared dictionary while another thread was trying to save the data?

  23. Olivier Grisel

    The problem is that the main thread is trying to call save_data after stopping the coverage collection but the thread still running in the background is appending data into the dict concurrently.

    However I just noticed something inconsistent in the traceback I reported in: https://bitbucket.org/ned/coveragepy/issues/581/44b1-44-breaking-in-ci#comment-38963003

    The traceback is about an exception in the main thread in the _cov.stop() call (in the _stop_coverage() function) while the print statement has already printed the "YOU WILL NEVER GET HERE!!" message which should only happen after the call to _cov.stop() based on the source code of the script...

  24. Log in to comment