This repository contains data files, data-collection scripts, and data-analysis scripts of the "Evaluating and Improving Fault Localization Techniques" project. Before exploring this repository, please read the technical report that describes the results.
The experiments evaluate various fault localization techniques on artificial faults and on real faults.
At a high level, here's how it all works:
- The real and artificial faults come from the Defects4J Project.
- For each D4J fault, the scripts in
d4j_integration/determine which lines are faulty. The resultant files are "buggy-lines" files, and live in
- Many fault localization techniques require coverage information. We use GZoltar to gather coverage information. The resultant files are called "matrix" and "spectra".
- Mutation-based fault localization (MBFL) techniques require mutation analysis. Our Killmap project (which lives in
killmap/) does mutation analysis on all faults. The resultant files are called "killmaps," and specify how each test behaves on each mutant. (Each killmap also has an associated "mutants-log" file, which describes all the mutants that were analyzed.)
- Our scripts enable you to compute all the mutation and coverage information, but doing so takes a great deal of computation. The resulting mutation/coverage information is available at http://fault-localization.cs.washington.edu.
- The "scoring pipeline" (which lives in
analysis/pipeline-scripts/) determines how well each FL technique does on each fault -- that is, where the real buggy lines appear in the FL technique's ranking of the line of the program. The results appear in
Before doing anything else, run
- clones the appropriate Defects4J fork (unless you've already exported a
- updates your
.bashrcto export some environment variables:
DEFECTS4J_HOME, pointing to the new
defects4jrepository, if it needed
FL_DATA_HOME, pointing here
KILLMAP_HOME, pointing at
GZOLTAR_JAR, pointing to
How to score techniques
The workflow to score a set of FL techniques on a given fault looks like this:
Various pieces of fault information were generated by the tools in
./d4j_integration/and then checked in. You don't need to generate them yourself, but if you want to, see the
README.mdin that directory.
To run GZoltar, use
bash run_gzoltar.sh Lang 37 . developer
Creates the files
To run Killmap, use
killmap/scripts/generate-matrix \ Lang 37 \ /tmp/Lang-37 \ Lang-37.mutants.log \ | gzip > Lang-37.killmap.csv.gz
Creates the files
To run the scoring pipeline, use
analysis/pipeline-scripts/do-full-analysis \ Lang 37 'developer' \ ./matrix ./spectra \ Lang-37.killmap.csv.gz Lang-37.mutants.log \ /tmp/Lang-37-scoring \ Lang-37.scores.csv`
Creates the file
For more details on any of these scripts, see the
README.md in the script's directory.
If you want to skip running GZoltar and Killmap (which can be very computationally expensive), you can download the resulting files from http://fault-localization.cs.washington.edu.
analysis/: Tools for analyzing the output of coverage/mutation analyses.
aws/: Scripts for computing killmaps on AWS.
cluster_scripts/: Scripts for computing killmaps on a Sun Grid cluster.
d4j_integration/: Scripts that build upon or extend Defects4J to populate or query its database.
data/: Data files for the final results and corresponding support scripts.
gzoltar/: Scripts for running the GZoltar tool to collect line coverage information.
killmap/: Mutation-analysis tool whose output is used for the MBFL techniques we study.
stats/: R scripts that crunch the data to produce numbers for the paper.
utils/: Utility programs and libraries for running/analyzing tests and parsing data files.