FLUCCS Data Artifact

FLUCCS (Fault Localization Using Code and Change Metrics) is a fault localisation approach which essentially extends SBFL techniques with code and change metrics. FLUCCS's main argument is that, by including code and change metrics, fault localisation performance can be improved. This artifact, defects4j-fluccs, contains the implementation of FLUCCS as well as the dataset used to evaluate it in the accompanying paper. Data sets generated by FLUCCS consist of suspiciousness scores from existing SBFL formulas as well as code and change metric values (age, churn, and complexity).


defects4j-fluccs is implemented on top of defects4j, a collection of reproducible Java faults. The usage of defects4j-fluccs is quite similar to the usage of defects4j. Since it is implemented as an extension of defects4j, basic tasks provided by defects4j can be also used in defects4j-fluccs.

Tasks that are specific to defects4j-flucss are as the following.

Command Description Arguments
fluccs-prepare Prepare for executing defects4j-fluccs operations -p: project_name, -b: bug_number, -w: working_directory
fluccs-coverage Calculate coverage per method -p: project_name, -b: bug_number, -w: working_directory
fluccs-age Calculate age per method -p: project_name, -b: bug_number, -w: working_directory
fluccs-churn Calculate churn per method -p: project_name, -b: bug_number, -w: working_directory
fluccs-complexity Calculate complexity per method -p: project_name, -b: bug_number, -w: working_directory
fluccs-gather Gather generated metrics and combine them to final data file -p: project_name, -b: bug_number, -w: working_directory, g: use_gpu
fluccs-stmt_mth_pair Generate information showing where the statement comes from -p: project_name, -b: bug_number, -w: working_directory
fluccs-gp Generate evolved ranking model using Genetic Programming -p: project_name, -b: bug_number, -w: working_directory, -d: data_directory, -i: result_id, -n: pair_id, -e: existing_pair_data

To execute these FLUCCS specific commands,

defects4j-fluccs 1 FLUCCS command [required arguments]

For basic defects4j commands,

defects4j-fluccs 0 defects4j-command [required arguments]

Detailed descriptions about FLUCCS specific commands will be explained later.

Supported Faults

defects4j-fluccs supports only 210 faults from projects Lang, Time, Math, and Closure; it does not support Chart yet. The following table lists the faults from these four projects that we excluded.

Project Number of faults Excluded faults
Commons Lang 60 23, 35, 46, 56, 57
Joda-Time 27 None
Commons Math 96 18, 49, 58, 61, 62, 63, 66, 67, 70, 77
Closure Compiler 27 26, 27, 30

Fault Lang 23 has been excluded because the fault is located in a method that overrides another method external to Commons Lang; FLUCCS considers the fault as out of scope if the faulty method does not originate from the subject SUT. The remaining faults have been excluded because we could not measure the coverage for the methods that contain those faults. This is due to a limitation of JaCoCo :

The oficial FAQ states that, if the normal sequence of statement execution is disturbed (by, for example, exceptions), a probe inserted by JaCoCo may not be executed, resulting in a failure to record coverage of any statements executed between the previous probe and the missed one.

Dependencies and Requirements

  • defects4j version 0.2.0
  • Java version 1.7
  • Perl version >= 5.0.10

    • For Try::Tiny and Switch modules,
      • sudo cpan (if you don't have cpan, install cpan; typing cpan will lead to automatic installation step.)
      • install Try::Tiny
      • install Switch
    • For XML::LibXML module,
      • sudo apt-get install libxml-libxml-perl
  • python2 version >=2.7.6 with the following packages: pyevolve, deap, numpy, scipy, and pycuda (optional)

    • pip install pyevolve
    • pip install deap
    • pip install numpy scipy
    • pip install pycuda (if a compatible GPU is available)
  • git version >= 2.5.0
  • ant version >= 1.9.3

Getting started

  1. Install defects4j 0.2.0. With the latest commit 2664528cdd4f2cbe69a7a81c55dbffb1fc9d8084 git clone
  2. Set defects4j's path as D4J_HOME environment variable. export D4J_HOME=path_to_defects4j
  3. Install defects4j (Go to defects4j Installation for further instructions). <!--3. Download defects4j-fluccs under path_to_defects4j/framework/bin.
    • As a result, executable defects4j-fluccs and directory fluccs will be located under path_to_defects4j/framework/bin-->
  4. Clone FLUCCS's repository under path_to_defects4j/framework/bin.
    • As a result, directory fluccs will be made under path_to_defects4j/framework/bin. git clone
  5. Move executable fluccs/defects4j-fluccs to path_to_defects4j/framework/bin. mv fluccs/defects4j-fluccs .

defects4j Installation

  1. Move to head of defects4j directory(path_to_defects4j) and initialize. cd path_to_defects4j ./
  2. Add defects4j's executable path to your PATH. export PATH=$PATH:path_to_defects4j/framework/bin under path_to_defects4j contains information about overall installation and use of defects4j.

Using defects4j-fluccs

Precomputed Data

  • To ease the burden of downloading and configuring this artifact, we pre-computed some information and included them in the artifact: these are required in order to generate the metric data but not the metric data themselves.
    • bcel/Project_Jar: contains jar files for each faulty version of the source code.
    • bcel/output: contains files related to call graph propagation; method call file, class and method pairing file.
    • method_stmt: contains statement and method pairing files (i.e. maps statements to methods).
    • header: contains lists of statements, specified with file name and line number, for each faulty version.
    • fault_list: contains files where identifiers of faulty methods are written.

Preparing the Dataset

  1. Go to the working directory.
  2. Checkout the source code version which introduces the target fault. defects4j-fluccs 0 checkout -p Project_Name(Lang|Math|Time|Closure) -v bug_number(f(fixed)|b(buggy)) -w working directory i.e. for fault Lang 2 with working directory lang_2_b, defects4j-fluccs 0 checkout -p Lang -v 2b -w lang_2_b for fault Lang 2
  3. Make preparation for the overall defects4j-fluccs operations. defects4j-fluccs 1 fluccs-prepare -p project_name(Lang|Math|Time|Closure) -b bug_number -w working_directory i.e. for fault Lang 2 with working directory lang_2_b, defects4j-fluccs 1 fluccs-prepare -p Lang -b 2 -w lang_2_b
  4. Generate data for specific metrics. - Program Spectra defects4j-fluccs 1 fluccs-coverage -p project_name(Lang|Math|Time|Closure) -b bug_number -w working_directory

    • Output file method_spectra.csv will be created under working_directory/output.
    • Data Format : method_identifier(class_name$method_name<arguments>),s1_ep,s1_np,s1_ef,s1_nf,s2_ep,s2_np,s2_ef,s2_nf ... ( s# indicates statement # in the target method, which is identified by method_identifier )
    • Age defects4j-fluccs 1 fluccs-age -p project_name(Lang|Math|Time|Closure) -b bug_number -w working_directory
    • Output file method_age.csv will be created under working_directory/output.
    • Data Format : method_identifier(class_name$method_name<arguments>),min_age,max_age,mean_age,CG_min_age,CG_max_age,CG_mean_age
    • Churn defects4j-fluccs 1 fluccs-churn -p project_name(Lang|Math|Time|Closure) -b bug_number -w orking_directory
    • Output file method_churn.csv will be created under working_directory/output.
    • Data Format : method_identifier(class_name$method_name<arguments>),churn,CG_min_churn,CG_max_churn,CG_mean_churn
    • Code Complexity

    defects4j-fluccs 1 fluccs-complexity -p project_name(Lang|Math|Time|Closure) -b bug_number -w working_directory - Output file method_complexity.csv will be created under working_directory/output. - Data Format : method_identifier(class_name$method_name<arguments>),number_of_arguments,number_of_local_variables,number_of_complied_JavaBytecode,Line_of_Code

  5. Generate the final data file by gathering the created data and calculating the suspiciousness scores for SBFL formulas.

    defects4j-fluccs 1 fluccs-gather -p project_name(Lang|Math|Time|Closure) -b bug_number -w working_directory -g (0|1) - Assume all output files, method_spectra.csv, method_age.csv, method_churn.csv, method_churn.csv, method_complexity.csv, are located under working_directory/output. - To speed up the computation of suspiciousness scores from SBFL formulas, you can use CUDA with the -g flag: - -g 0: use CPU to calculate suspiciousness scores -->have problem now --> b/c of nvidia-361 <!-- sudo apt-get remove nvidia-361 --> - -g 1: use GPU to calculate suspiciousness scores - Intermediate output file method_all.csv will be created under working_directory/output. - Data Format : method_identifier(class_name$method_name<arguments>),spectra,age,complexity,churn ; spectra, age, churn, complexity parts in line will be same order of previous file format (i.e. churn: churn,CG_min_churn,CG_max_churn,CG_mean_churn)`

  • Final output file project_name_bug_number.dat, created using python module pickle, will be generated under working_directory/output.
    • Data Format : consist of two parts: indice to faulty methods, method metric vectors. The method metric vectors are in the following data format: method_identifier(class_name$method_name<arguments>),ochiai,jaccard,gp13,wong1,wong2,wong3,tarantula,ample,RussellRao,SorensenDice,Kulczynski1,SimpleMatching,M1,RogersTanimoto,Hamming,Ochiai2,Hamann,Hamann,Kulczynski2,Sokal,M2,Goodman,Euclid,Anderberg,Zoltar,ER1a,ER1b,ER5a,ER5b,ER5c,gp02,gp03,gp19,age,complexity,churn

Evolving Ranking Models

  • Generates a ranking model using Genetic Programming: defecs4j-fluccs 1 fluccs-gp -p project_name(Lang|Math|Time|Closure) -b bug_number -w working_directory -d data_directory(directory for data which will be used for GP) -i result_id -n pair_id -e existing_pair_data(0|1)
    • data_directory : data directory for the GP (i.e. the directory that contains the results from previous data generation step).
    • pair_id : FLUCCS uses 10-fold cross validation. Each fold is specified with a number between 0 to 9; this number is called pair id and the data with currently chosen pair id will be used as test data whereas the other data with different pair id will be used as training data.
    • result_id : to distinguish each result when there are multiple of them, user can give a specific id for the ranking model.
    • existing_pair_data : determines whether to use the pair.txt file under the current directory or to write a new pair.txt file.
      • -e 1: use pair.txt under current directory
      • -e 0: write and use new pair.txt
    • Output files ( under working_directory/output )
      • result_id.result.csv: consists of two values: ranking model( formula ) and its fitness.
      • pair.txt: data file specifying which training data set and test data set are used as pair
    • i.e. for fault Lang 2, if a user wants to generate ranking model using data in Data directory with new pair.txt, result id 0, pair id 1, and current directory (.) as the working directory then, defects4j-fluccs 1 fluccs-gp -p Lang -b 2 -w . -d Data -i 0 -n 1 -e 0` After execution, the output 0.result.csv will be created under ./output.

Directory structure for defects4j-fluccs

Under defects4j executables directory ( D4J_HOME/framework/bin )

|--- fluccs                                                 
        |--- checkout               fluccs-prepare
                    |--- Lang       Contains files which should be distributed for Lang project
                    |--- Math       Contains files which should be distributed for Math project
                    |--- Time       Contains files which should be distributed for Time project
                    |--- Closure    Contains files which should be distributed for Closure project
                    |--- prepare    Contains main executables for fluccs-prepare
        |--- gen_stmt_mth_pair      fluccs-stmt_mth_pair
        |--- coverage               fluccs-coverage
        |--- age                    fluccs-age
        |--- churn                  fluccs-churn
        |--- complexity             fluccs-complexity
        |--- gather                 fluccs-gather : gathers all generated data into a single file
        |--- to_dat                 fluccs-gather : generate intermediate file-without formula calculation and finding index for fault 
        |--- sbfl_metrics           fluccs-gather : generate final data file
        |--- header                 header file directory
        |--- method_stmt            pairing file(statement-method) directory
        |--- fault_list             faulty method file directory
        |--- gp                     fluccs-gp
        |--- perl                   perl modules
        |--- python                 python modules