Codeine implements models for retrieving Java methods using English queries. The implemented models include: term-matching model, Polylingual Latent Dirichlet Allocation(PLDA) model, and IBM model 1.
The directory data/jel contains machine-readable XML documentation of several packages of the Java standard library.
The directory data/extract contains data which has been extracted from the XML files, pre-processed and split into train/validation/test set. These files are in plain text format. Refer to the README.rst file in data/extract for details.
The directory bin contains scripts files for running each model.
In order to run an experiment, type the following commands in the console, from the codeine directory:
Replace "MODEL_NAME" by the name of the model. There are three models implemented:
- term-matching: baseline model
- plda: Polylingual Latent Dirichlet Allocation
- ibm: IBM model 1
(Note: to run the experiements of plda model, "Mallet" folder needs to be located in the same path as codeine. E.g. /user/home/codeine; /user/home/mallet.)
After each successful run, there will be an output folder assigned a name with the model name and the parameters. The output folder contains the generated intermediate files while the corresponding Mean Reciprocal Rank scores can be found in the directory data/run.
- Huijing Deng and Grzegorz Chrupała. 2014. Semantic approaches to software components retrieval using English queries. To appear in LREC 2014.