HTTPS SSH

Repository for hosting artifacts of the Marlin case study.

Accepted paper at ICSME 2015 (International Conference on Software Maintenance and Evolution)

http://itu.dk/people/scas/papers/ICSME2015-Marlin-preprint.pdf

Database

Folder - mysql-database

The database can be imported either using MySQL workbench - it is an self-contained schema with creation queries, so the user should just import it - the schema name is marlin_icsme. Or it can be imported using this command

mysql -u [username] -p -h [localhost] < marlin_icsme.sql

where [username] must be set to your MySQL username and [localhost] is your URL the database server

A couple of queries that can be run on the DB are in the sql-queries-for-data folder

Repositories

There are too many repositories too be stored, thus we only provide the main Marlin repository - tools/python-scripts/ErikZalm-Marlin.

Tools

Github-extractor

This folder contains the binary program (jar) that was used for extracting meta-data from Github as json files.

The source files can be found at bitbucket.org/s_stanciulescu/github-extractor

Usage:

  1. Because the Marlin project has many forks, we have used 4 Github tokens to speed up the process of retrieval of data. Therefore you need to add your tokens in github.txt

Python-scripts

Information: -ErikZalm-Marlin is the main Marlin repository that we used in our anaysis -list of forks contains all the active forks and their creation date, fork owner user, fork owner name (if it exists) and fork owner e-mail (if it exists) -retrieve-all-repos-heuristics is the script for retrieving all the repositories and all their branches, and analyzing using heuristics what each repository is intended for -analyze-for-ifdefs is the script for analyzing the repositories for commits that contain #ifdef annotations.

Requirements:

Python 2.7 Git 1.9.5 MySQL

Usage:

  1. Set user and password for MySQL database in retrieve-all-repos-heuristics.py file
  2. Then run the retrieve-all-repos-heuristics script.
  3. Run analyze-for-ifdefs after the initial script is run (because 2 retrieves all repositories).

Most likely the first script will stop at some point due to forks that may have been removed. One should verify if all the forks still exist (modify the script), and then run it.

To be fixed: first get a list of valid forks in the script that should be used.