This repository contains the source code and the scripts for the algorithm and experiments presented in the paper: Jens Dietrich, Nicholas Hollingum, Bernhard Scholz: Giga-Scale Exhaustive Points-To Analysis for Java in Under a Minute. OOPSLA'15.
Java source code is contained in three folders:
java/srccontains the main Java packages
java/benchmarkscontains the Java scripts used to run individual experiments and
java/testcasescontains junit tests.
The repository contains implementations of three different algorithms:
au.edu.sydney.cflr.WorklistPointsTois an implementation of the worklist algorithm to compute CFL-reachability from Melski / Reps: Interconvertibility of a class of set constraints and context-free-language reachability. Theoretical Computer Science 248(1), (2000)
nz.ac.massey.gp2.diffprop.DiffPropPointsTois an implementation of the difference propagation algorithm from Sridharan, Fink - The Complexity of Andersens Analysis in Practice. Static Analysis. Springer, 2009.
nz.ac.massey.gp2.pointsto.TCReachabilityis an implementation of the algorithm presented in Dietrich, Hollingum, Scholz: Giga-Scale Exhaustive Points-To Analysis for Java in Under a Minute. OOPSLA'15.
The algorithms operate on graphs that represent the semantics of a Java program.
The repository contains representations of these graphs encoded in comma-separated file (CSV) format.
datasets/dacapo9 contains the dacapo data sets.
For each program in the dacapo benchmark, there are input files (
Store.csv) as well as result data (
The result data are computed with the "base-line" Melski-Reps algorithm (see above).
This means that for these datasets we can run experiments using these results as oracles, in particular to check for false negatives (soundness) and for false positives (precision).
There are two components missing in this distribution:
- The OpenJDK data set is missing for legal reasons.
We expect that we will be able to add this to the repository at a later stage.
- The paper also contains a comparison with a LogicBlox-based implementation.
For licensing reasons, LogicBlox is not included in this repository.
However, the datalog program used is included in the repository (in
lb/*.logic). A LogicBlox license can be obtained from the LogicBlox web site.
Note that the distribution submitted to the OOPSLA artifact evaluation did contain both the OpenJDK dataset and LogicBlox under a special license arrangement.
Using the Repository
- Java JDK (Tested on:
- Apache Ant
- optional Bourne Again Shell (BASH)
- optional LogicBlox
You can verify a correct download by compiling and running our test framework:
$ cd java $ ant test $ <your browser> build/junit-reports/index.html
The experimental framework is designed to run in Unix-like environments.
We have tested on Ubuntu 14.10, Linux Mint 17.1, and use Lubuntu 14.10-Alternate for our expeimentation.
If you wish to use a different environemt, some manual work may be required to execute experiments and understand the output.
If you do not have LogicBlox, its experiments will be excluded automatically from our framework.
Assuming you have obtained a LogicBlox license, the following setup is necessary:
$ cd <LogicBlox directory> $ ls bin blade BlockResources ... $ export LOGICBLOX_HOME=`pwd` $ export LB_PAGER_FORCE_START=1 $ $LOGICBLOX_HOME/bin/bloxbatch -version BloxBatch: Version: 3.9 Build: 63809_50b1145236eb Patch level: 3.9.0 Build type: R ...
These instructions assume you are using a properly setup environment.
If these commands do not work on your machine, you can inspect the BASH scripts to determine how to emulate them.
The experimental framework is designed to reproduce our published results on your hardware.
run.sh performs all experimentation on a single benchmark, the scripts
Table*.sh supply flags to
run.sh which restricts the experiments it runs.
To list the available flags, run
To confirm that everything is working correctly, start by running a simple analysis of the sunflow benchmark
$ ./Table1-2.sh -i datasets/dacapo9/sunflow
This prints a statistical summary of sunflow's points-to graph.
All the experimental results in our paper can be reproduced in this way.
In our context, correctness is defined as consistency between implementations.
To this end, DaCapo datasets are destributed with the correct
VarPointsTo relation included.
correctness.sh can be used to verify the consistency between the supplied VarPointsTo and a chosen algorithm.
The following command checks the consistency of the Transitive-Closure implementation against
VarPointsTo for the Avrora benchmark:
$ ./correctness.sh tc datasets/dacapo9/avrora
In addition to the DaCapo 2009 dataset provided, we have included a compressed version of the DaCapo 2006 benchmarks, which are commonly used in the research literature.
To use 2006 benchmarks, simply unzip them and direct the
Table*.sh script to that directory with
-i <DaCapo 2006 directory>.
Our minimalistic analysis is designed to work for programs with allocation, assignment, field-load and field-store semantics.
If you wish to supply your own input data, create a directory with the following header-less CSV files:
Alloc.csv, abstract heap allocations of the form
 = new Obj();, and
is an abstract-memory location for the Obj.
Assign.csv, variable assignments and casts of the form
 = ;.
Load.csv, field loads of the form
 = ..
Store.csv, field stores of the form
. = .
[n] refers to the n-th data column of the file (e.g.
Load.csv has 3 columns:
Simple example input files are included in the
datasets/.testdata/* directories, which include pictures and java source code.