`lstat()` dominates in the case of small coverage samples

Issue #625 resolved
Buck Evan created an issue

The hypothesis library recently added coverage-led fuzzing, in which it needs to run a very short test many times, while examining the coverage between each trial. This (currently) involves many calls to coverage.Collector.save_data, which in turn causes many calls to realpath (and thus lstat). In the extreme case, lstat() ends up taking about 40% of the run time.

Can you please help me design a remedy? Some alternatives that I can think of:

  1. add a cache to files.abs_file
  2. replace the call to abs_file with a call to files.canonical_path, since canonical_path already has a cache
  3. Delegate the filename-normalization responsibility from Collector to CoverageData, such that we can specialize CoverageData and fix this within our dependent library.

Comments (5)

  1. Ned Batchelder repo owner

    Hmm, actually, canonical_filename searches for relative filenames on sys.path... I wish I understood better why it needs to do that.

  2. Log in to comment