GenX Referring Expression Corpus Author: Nicholas FitzGerald (email@example.com) This repository contains a dataset of referring expressions. The dataset consists images of coloured blocks, some of which are circled. For each image, 20 users of Amazon Mechanical Turk were asked to fill in the blank in the sentence "Please pick up _________" in such a way as to instruct a partner to pick up the circled blocks. This data was collected for the following paper (please cite this paper if using this data for your own research): Learning Distributions over Logical Forms for Referring Expression Generation Nicholas FitzGerald, Yoav Artzi, Luke Zettlemoyer Empirical Methods in Natural Language Processing (EMNLP 2013) ---------------------------------------------------------------------- Here is a directory of the files in this repository: <images> Contains the .png images shown to the subjects on Mechanical Turk. <state> <labelling> ALL.txt Contains all the referring expressions collected from Mechanical Turk. These have been preprocessed by converting to lowercase and normalizing punctuation. ALL_SPELLCHECKED.txt All the referring expressions spellchecked, and "the" added to the front of any expressions not properly determined (e.g. "brown blocks" -> "the brown blocks"). <all> - contains the data split and labels used for the full task (see above EMNLP paper) <single> - contains only scenes which contain a single object target set, for the single-object subtask. <all> and <single> contain the following: init - the initialization set used to train the semantic parser which is used to automatically label the bulk of the training data (manually labeled) devtest - the development testset (manually labeled) devtrain - unlabeled training data which was labeled with the semantic parser heldout - the heldout test data (manually labeled) LABELED_TRAINING - devtrain data labeled by the trained semantic parser, concatenated with init and devtest For devtest, heldout and init, the files labeled "NOBAD" have expressions removed which, for various reasons, could not be assigned a meaning representation. This could be for the following reasons: - The referring expression was incorrect (did not pick out the right set of objects) - The expression used a concept that we do not model (e.g. a spatial relation, size, or material type) - The expression was ungrammatical in a way that could not be easily resolved (i.e. was just list of attributes, not a proper noun phrase).