nfitzgerald avatar nfitzgerald committed 6adc00b

initial commit

Comments (0)

Files changed (301)

+GenX Referring Expression Corpus
+Author: Nicholas FitzGerald (nfitz@cs.washington.edu)
+
+This repository contains a dataset of referring expressions. The dataset
+consists images of coloured blocks, some of which are circled. For each image,
+20 users of Amazon Mechanical Turk were asked to fill in the blank in the 
+sentence "Please pick up _________" in such a way as to instruct a partner to 
+pick up the circled blocks.
+
+This data was collected for the following paper (please cite this paper if
+using this data for your own research):
+
+Learning Distributions over Logical Forms for Referring Expression Generation
+Nicholas FitzGerald, Yoav Artzi, Luke Zettlemoyer
+Empirical Methods in Natural Language Processing (EMNLP 2013)
+
+----------------------------------------------------------------------
+
+Here is a directory of the files in this repository:
+
+<images>
+    Contains the .png images shown to the subjects on Mechanical Turk.
+
+<state>
+
+<labelling>
+    ALL.txt
+        Contains all the referring expressions collected from Mechanical Turk.
+        These have been preprocessed by converting to lowercase and normalizing
+        punctuation.
+
+    ALL_SPELLCHECKED.txt
+        All the referring expressions spellchecked, and "the" added to the
+        front of any expressions not properly determined (e.g. "brown blocks" -> "the
+        brown blocks").
+
+    <all> - contains the data split and labels used for the full task (see
+            above EMNLP paper)
+
+    <single> - contains only scenes which contain a single object target set, for the
+            single-object subtask.
+
+        <all> and <single> contain the following:
+
+            init - the initialization set used to train the semantic parser
+                    which is used to automatically label the bulk of the
+                    training data (manually labeled)
+            devtest - the development testset (manually labeled)
+            devtrain - unlabeled training data which was labeled with the
+                    semantic parser
+            heldout - the heldout test data (manually labeled)
+            LABELED_TRAINING - devtrain data labeled by the trained semantic
+                    parser, concatenated with init and devtest
+
+
+            For devtest, heldout and init, the files labeled "NOBAD" have
+            expressions removed which, for various reasons, could not be 
+            assigned a meaning representation. This could be for the following 
+            reasons:
+
+                - The referring expression was incorrect (did not pick out the
+                  right set of objects)
+                - The expression used a concept that we do not model (e.g. a
+                  spatial relation, size, or material type)
+                - The expression was ungrammatical in a way that could not be
+                  easily resolved (i.e. was just list of attributes, not a
+                  proper noun phrase).
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.