[TASK] Get Real-World Examples of Buggy Code

Issue #2 resolved
Martin Velez created an issue

Overview

We hypothesize that seeing example fixes, in addition to the compiler output, will help users fix compilation errors faster. To test this hypothesis, we would like to conduct a user study. The design of the user study requires that we gather real-world examples of code that trigger compiler diagnostic message. We define real-world examples as pieces of code shared by people online.

Our procedure was able to trigger about 114,00 distinct diagnostic messages. We have sampled 100 diagnostic messages uniformly at random without replacement.

The attached CSV file contains the 100 diagnostic messages. The file can also be found in {repo}/real-world_examples/100examples.csv. File Format:

id, sha1, diagnostic_message

Instructions

For each diagnostic message DM:

  1. Copy-paste the diagnostic message into Google.
  2. For each page in the top-10 results, grab the code snippet(s).
  3. Test if any of the code snippets, independently or composed together, trigger DM.
    • clang++-3.9 -c -Wfatal-errors -std=c++14
  4. If yes, store the code in file named {ID}.cpp in {repo}/real-world_examples.
  5. If no, store the code in a file in {repo}/real-world_examples/unknown

This collection procedure strives to be as unbiased as possible.

Once we collect 10-20 real-world examples, we conduct a pilot study.

Comments (13)

  1. Martin Velez reporter
    • edited description

    I forgot to add the compiler command we are using.

    clang++-3.9 -c -Wfatal-errors -std=c++14
    
  2. Nima Johari

    I went from 100 (backwards) to 75 and found a total of 14 pieces of code on the internet that trigger the corresponding error message.

    Each are saved in a directory at metacompiler/real-world_examples/{ID} (since some of them were presented in multiple files on the internet)

    A log of my findings can be found in the README file in the same folder.

    @martinvelez: Perhaps we can prototype the study with what we actually have right now, and look for more if we needed to?

  3. Martin Velez reporter

    Found 16 real-world examples that have single-operation, single-token solutions.

    Used 6 for pilot study. Used 10 for main study.

  4. Log in to comment