GenX Referring Expression Corpus
Author: Nicholas FitzGerald (

This repository contains a dataset of referring expressions. The dataset
consists images of coloured blocks, some of which are circled. For each image,
20 users of Amazon Mechanical Turk were asked to fill in the blank in the 
sentence "Please pick up _________" in such a way as to instruct a partner to 
pick up the circled blocks.

This data was collected for the following paper (please cite this paper if
using this data for your own research):

Learning Distributions over Logical Forms for Referring Expression Generation
Nicholas FitzGerald, Yoav Artzi, Luke Zettlemoyer
Empirical Methods in Natural Language Processing (EMNLP 2013)


1. Obtaining the Corpus

There are two ways to obtan the corpus. The first way is to use git, with the
following command:

$ git clone

The corpus can also be downloaded as a .zip file under the "Downloads" link:


2. Corpus directory and description

Here is a directory of the files in this repository:

    Contains the .png images shown to the subjects on Mechanical Turk. Each
    file is named by the scene-ID of the particular scene.


        Contains all the referring expressions collected from Mechanical Turk.
        These have been preprocessed by converting to lowercase and normalizing

        All the referring expressions spellchecked, and "the" added to the
        front of any expressions not properly determined (e.g. "brown blocks" -> "the
        brown blocks").

    <all> - contains the data split and labels used for the full task (see
            above EMNLP paper)

    <single> - contains only scenes which contain a single object target set, for the
            single-object subtask.

        <all> and <single> contain the following:

            init - the initialization set used to train the semantic parser
                    which is used to automatically label the bulk of the
                    training data (manually labeled)
            devtest - the development testset (manually labeled)
            devtrain - unlabeled training data which was labeled with the
                    semantic parser
            heldout - the heldout test data (manually labeled)
            LABELED_TRAINING - devtrain data labeled by the trained semantic
                    parser, concatenated with init and devtest

            For devtest, heldout and init, the files labeled "NOBAD" have
            expressions removed which, for various reasons, could not be 
            assigned a meaning representation. This could be for the following 

                - The referring expression was incorrect (did not pick out the
                  right set of objects)
                - The expression used a concept that we do not model (e.g. a
                  spatial relation, size, or material type)
                - The expression was ungrammatical in a way that could not be
                  easily resolved (i.e. was just list of attributes, not a
                  proper noun phrase).

    <state> - contains the world-state information for each scene. There are
        two files:

            SceneIndex.txt - lists, for each scene, which objects are in the
                target-set (G) - i.e. which objects are circled in the
                corresponding image. Each line of the file is formatted as:
                    <sceneID>::<selected object>::<distractor objects>
                where <selected objects> is a space-seperated list of the
                objectIDs of the objects which are selected (inside the
                circles) in that scene, and <distractor objects> is the
                space-separated list of the objects which are NOT selected.

            Attributes.tsv - lists, for each object, what attributes that
                object has (i.e. colour, shape etc.) Each lines of the file is
                formatted as:
                where attributes is a comma-seperated list of the objects
                attributes. Each attribute is written as "value:type", for
                example "red:color". Every object has the attribute "misc:misc"
                which corresponds to words like "toy" or "object" which can
                refer to every object in the scene.