Coverage Analysis Flow
In the hardware design work I do day to day, it's common to do analysis on coverage data to evaluate the performance of the tests being written. For some reason the ideas don't seem to have transitioned to software development, so this is an attempt to describe what is done, and how it is implemented.
Terminology & Concepts
Hareware design has a few terms which may be confusing and / or daunting to someone not familiar with them. I'll try to explain them here.
- Hardware Design Language. Language used to write hardware designs, and their test environments. Examples include VHDL (based on Ada), Verilog (looks a bit like C) and System Verilog (looks a bit like C++).
- Device Under Test. The code you're testing.
- A program which runs the DUT, the test bench, and the tests. (all written various HDLs) You could think of it as an interpretter like CPython, just for hardware design languages. Most simulators require compilation and elaboration (linking) of code prior to running the simulator.
- Test Bench
- A test harness. Code which provides support functions for testing the DUT. You're not intested in coverage of this code.
Also, these terms are useful when talking about generalised coverage:
- Coverage Item
- A single thing upon which coverage is collected. Each item can can only be HIT or NOT-HIT. Could be a block or line of code, elements of a conditional expression being true or false, values passed to a function or stored in a variable being within a range.
- Coverage Model
- The set of Coverage Items defined at compilation. There is normally some configuration which states which coverage items should be included in the model. This allows test code to be excluded.
- Coverage Database
- A file which contains the coverage results of one or more tests. The database schema is the coverage model, but may only hold values for those items which are HIT. All others would be assumed to be NOT-HIT.
Obviously, prior to any analysis, the tests have to be run and coverage data collected.
- The DUT, test bench and tests are compiled and elaborated. This is done only once. This ensures that all the coverage data generated conforms to a single coverage model.
- A test runner runs each test, invoking an instance of the simulator each time, and passing parameters which say where the coverage data should be stored. Often jobs are farmed out of a network of machines, so tests may run in parallel. Hence each test writes to a unique coverage database.
Now that we have one coverage database per test, we can combine them in different ways to tell us useful things. All are based on set operations:
- Test Coverage
- The count of HIT items in a single test.
- Feature Coverage
- Given a set of coverage items which describe a feature, the count of HIT items which intersect that set in each test.
- Unique Coverage
- The count of HIT items in a single test, after the union of HIT items in all other tests has been subtracted.
- Total Coverage
- The count of items in the union of HIT items in each test.
- Most valuable test
- Test with highest count of HIT items.
- 2nd most valuable test
- Test with highest count of HIT items, after the set of items HIT by the most valuable test has been subtracted.
- Nth most valuable test
- Test with highest count of HIT items, after the union of items HIT by the 1 -> N-1 most valuable tests has been subtracted.
- Redundent Tests
- Tests which have zero unique coverage.
Now that we have these other views on the coverage data, we can can do things like:
- Build a list of tests which test an area of the code we're working on. We can then run that list as a quick regression check as we work.
- See which items a test uniquely provides coverage on. Useful for seeing if a test is effective against it's goals, or if you need to understand a legacy, or poorly documented test.
- Build a list of the top ten most valuable tests. This can then be used as a quick pre-commit check. Combining coverage data with a cost function like CPU run-time can allow more complex selection criteria. For example, a test list which get the highest coverage possible in thirty seconds.
- Look at tests with zero unique coverage, and possibly remove them. Be warned though, it's possible a test can test things our coverage model doesn't measure. Blindly deleting tests should be discoraged.