1. Grzegorz Chrupała
  2. codeine

Overview

HTTPS SSH

Codeine

Codeine implements models for retrieving Java methods using English queries. The implemented models include: term-matching model, Polylingual Latent Dirichlet Allocation(PLDA) model, and IBM model 1.

Data

The directory data/jel contains machine-readable XML documentation of several packages of the Java standard library.

The directory data/extract contains data which has been extracted from the XML files, pre-processed and split into train/validation/test set. These files are in plain text format. Refer to the README.rst file in data/extract for details.

Models

The directory bin contains scripts files for running each model.

In order to run an experiment, type the following commands in the console, from the codeine directory:

./bin/MODEL_NAME.sh

Replace "MODEL_NAME" by the name of the model. There are three models implemented:

  • term-matching: baseline model
  • plda: Polylingual Latent Dirichlet Allocation
  • ibm: IBM model 1

(Note: to run the experiements of plda model, "Mallet" folder needs to be located in the same path as codeine. E.g. /user/home/codeine; /user/home/mallet.)

After each successful run, there will be an output folder assigned a name with the model name and the parameters. The output folder contains the generated intermediate files while the corresponding Mean Reciprocal Rank scores can be found in the directory data/run.

References

  • Huijing Deng and Grzegorz Chrupała. 2014. Semantic approaches to software components retrieval using English queries. To appear in LREC 2014.