Repository for hosting artifacts of the Marlin case study.

Accepted paper at ICSME 2015 (International Conference on Software Maintenance and Evolution)


Folder - mysql-database

The database can be imported either using MySQL workbench - it is an self-contained schema with creation queries, so the user should just import it - the schema name is marlin_icsme. Or it can be imported using this command

mysql -u [username] -p -h [localhost] < marlin_icsme.sql

where [username] must be set to your MySQL username and [localhost] is your URL the database server

A couple of queries that can be run on the DB are in the sql-queries-for-data folder


There are too many repositories too be stored, thus we only provide the main Marlin repository - tools/python-scripts/ErikZalm-Marlin.



This folder contains the binary program (jar) that was used for extracting meta-data from Github as json files.

The source files can be found at


  1. Because the Marlin project has many forks, we have used 4 Github tokens to speed up the process of retrieval of data. Therefore you need to add your tokens in github.txt


Information: -ErikZalm-Marlin is the main Marlin repository that we used in our anaysis -list of forks contains all the active forks and their creation date, fork owner user, fork owner name (if it exists) and fork owner e-mail (if it exists) -retrieve-all-repos-heuristics is the script for retrieving all the repositories and all their branches, and analyzing using heuristics what each repository is intended for -analyze-for-ifdefs is the script for analyzing the repositories for commits that contain #ifdef annotations.


Python 2.7 Git 1.9.5 MySQL


  1. Set user and password for MySQL database in file
  2. Then run the retrieve-all-repos-heuristics script.
  3. Run analyze-for-ifdefs after the initial script is run (because 2 retrieves all repositories).

Most likely the first script will stop at some point due to forks that may have been removed. One should verify if all the forks still exist (modify the script), and then run it.

To be fixed: first get a list of valid forks in the script that should be used.