Overview
Atlassian Sourcetree is a free Git and Mercurial client for Windows.
Atlassian Sourcetree is a free Git and Mercurial client for Mac.
About
This repository contains the source code and the scripts for the algorithm and experiments presented in the paper: Jens Dietrich, Nicholas Hollingum, Bernhard Scholz: Giga-Scale Exhaustive Points-To Analysis for Java in Under a Minute. OOPSLA'15.
Repository Structure
Java Sources
Java source code is contained in three folders:
java/src
contains the main Java packagesjava/benchmarks
contains the Java scripts used to run individual experiments andjava/testcases
contains junit tests.
The repository contains implementations of three different algorithms:
au.edu.sydney.cflr.WorklistPointsTo
is an implementation of the worklist algorithm to compute CFL-reachability from Melski / Reps: Interconvertibility of a class of set constraints and context-free-language reachability. Theoretical Computer Science 248(1), (2000)nz.ac.massey.gp2.diffprop.DiffPropPointsTo
is an implementation of the difference propagation algorithm from Sridharan, Fink - The Complexity of Andersens Analysis in Practice. Static Analysis. Springer, 2009.nz.ac.massey.gp2.pointsto.TCReachability
is an implementation of the algorithm presented in Dietrich, Hollingum, Scholz: Giga-Scale Exhaustive Points-To Analysis for Java in Under a Minute. OOPSLA'15.
Benchmark Data
The algorithms operate on graphs that represent the semantics of a Java program.
The repository contains representations of these graphs encoded in comma-separated file (CSV) format.
datasets/dacapo9
contains the dacapo data sets.
For each program in the dacapo benchmark, there are input files (Alloc.csv
,Assign.csv
,Load.csv
and Store.csv
) as well as result data (VarPoints.csv
).
The result data are computed with the "base-line" Melski-Reps algorithm (see above).
This means that for these datasets we can run experiments using these results as oracles, in particular to check for false negatives (soundness) and for false positives (precision).
Limitations
There are two components missing in this distribution:
- The OpenJDK data set is missing for legal reasons.
We expect that we will be able to add this to the repository at a later stage. - The paper also contains a comparison with a LogicBlox-based implementation.
For licensing reasons, LogicBlox is not included in this repository.
However, the datalog program used is included in the repository (inlb/*.logic
). A LogicBlox license can be obtained from the LogicBlox web site.
Note that the distribution submitted to the OOPSLA artifact evaluation did contain both the OpenJDK dataset and LogicBlox under a special license arrangement.
Using the Repository
Dependencies
- Java JDK (Tested on:
1.7.0_67
and1.8.0_20
) - Apache Ant
- optional Bourne Again Shell (BASH)
- optional LogicBlox
You can verify a correct download by compiling and running our test framework:
$ cd java $ ant test $ <your browser> build/junit-reports/index.html
The experimental framework is designed to run in Unix-like environments.
We have tested on Ubuntu 14.10, Linux Mint 17.1, and use Lubuntu 14.10-Alternate for our expeimentation.
If you wish to use a different environemt, some manual work may be required to execute experiments and understand the output.
LogicBlox Setup
If you do not have LogicBlox, its experiments will be excluded automatically from our framework.
Assuming you have obtained a LogicBlox license, the following setup is necessary:
$ cd <LogicBlox directory> $ ls bin blade BlockResources ... $ export LOGICBLOX_HOME=`pwd` $ export LB_PAGER_FORCE_START=1 $ $LOGICBLOX_HOME/bin/bloxbatch -version BloxBatch: Version: 3.9 Build: 63809_50b1145236eb Patch level: 3.9.0 Build type: R ...
Running Experiments
These instructions assume you are using a properly setup environment.
If these commands do not work on your machine, you can inspect the BASH scripts to determine how to emulate them.
The experimental framework is designed to reproduce our published results on your hardware.
The script run.sh
performs all experimentation on a single benchmark, the scripts Table*.sh
supply flags to run.sh
which restricts the experiments it runs.
To list the available flags, run ./run.sh -h
.
To confirm that everything is working correctly, start by running a simple analysis of the sunflow benchmark
$ ./Table1-2.sh -i datasets/dacapo9/sunflow
This prints a statistical summary of sunflow's points-to graph.
All the experimental results in our paper can be reproduced in this way.
Verifying Correctness
In our context, correctness is defined as consistency between implementations.
To this end, DaCapo datasets are destributed with the correct VarPointsTo
relation included.
The script correctness.sh
can be used to verify the consistency between the supplied VarPointsTo and a chosen algorithm.
The following command checks the consistency of the Transitive-Closure implementation against VarPointsTo
for the Avrora benchmark:
$ ./correctness.sh tc datasets/dacapo9/avrora
Input Data
In addition to the DaCapo 2009 dataset provided, we have included a compressed version of the DaCapo 2006 benchmarks, which are commonly used in the research literature.
To use 2006 benchmarks, simply unzip them and direct the run.sh
/Table*.sh
script to that directory with -i <DaCapo 2006 directory>
.
Our minimalistic analysis is designed to work for programs with allocation, assignment, field-load and field-store semantics.
If you wish to supply your own input data, create a directory with the following header-less CSV files:
Alloc.csv
, abstract heap allocations of the form[0] = new Obj();
, and[1]
is an abstract-memory location for the Obj.Assign.csv
, variable assignments and casts of the form[1] = [0];
.Load.csv
, field loads of the form[1] = [0].[2]
.Store.csv
, field stores of the form[1].[2] = [0]
.
Here [n]
refers to the n-th data column of the file (e.g. Load.csv
has 3 columns: load-base-variable,assigned-to-variable,loaded-field
).
Simple example input files are included in the datasets/.testdata/*
directories, which include pictures and java source code.