# spf / README.html

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345   UW SPF - README

UW SPF - The University of Washington Semantic Parsing Framework

Authors

Developed and maintained by Yoav Artzi
Contributors: Luke Zettlemoyer, Tom Kwiatkowski

When using UW SPF, please acknowledge it by citing:

Artzi, Y., & Zettlemoyer, L. (2013). UW SPF: The University of Washington Semantic Parsing Framework.

Bibtex:

@misc{Artzi:13spf,     Author = {Yoav Artzi and Luke Zettlemoyer},     Title = {{UW SPF: The University of Washington Semantic Parsing Framework}},     Year = {2013},     Eprint = {arXiv:1311.3011}, }

The article and bib file are both attached to the source code. When using specific algorithms please cite the appropriate work (see below).

Validation-based learning, joint inference and coarse-to-fine lexical generation

Yoav Artzi and Luke Zettlemoyer. Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions. In Transactions of the Association for Computational Linguistics (TACL), 2013.

Loss-sensitive learning

Yoav Artzi and Luke Zettlemoyer. Bootstrapping Semantic Parsers from Conversations. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2011.

Unification-based GENLEX

Tom Kwiatkowski, Luke Zettlemoyer, Sharon Goldwater, and Mark Steedman. Inducing Probabilistic CCG Grammars from Logical Form with Higher-order Unification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2010.

Factored lexicons

Tom Kwiatkowski, Luke Zettlemoyer, Sharon Goldwater, and Mark Steedman. Lexical Generalization in CCG Grammar Induction for Semantic Parsing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2011.

Template-based GENLEX

Luke Zettlemoyer and Michael Collins. Online Learning of Relaxed CCG Grammars for Parsing to Logical Form. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 2007.

Luke Zettlemoyer and Michael Collins. Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars. In Proceedings of the Twenty First Conference on Uncertainty in Artificial Intelligence (UAI), 2005.

Documentation

We are constantly updating this section.

See our ACL 2013 tutorial for general information about semantic parsing with CCGs. The slides are available here.

Building

To compile SPF use: ant dist. The output JAR file will be in the dist directory. You can also download the compiled JAR file from the downloads section.

Running example experiments

The framework contains an example experiment using the GeoQuery corpus. To use development fold 0 for testing, and training on the other folds, use:
java -jar dist/spf-1.4.jar geoquery/experiments/template/dev.cross/dev.fold0.exp
The log and output files are written to a newly generated directory in the experiment directory:
geoquery/experiments/template/dev.cross/

View the .exp file and see how it defines arguments and how it includes them from other files. Another critical point of entry is the class edu.uw.cs.lil.tiny.geoquery.GeoMain. The experiments platform (ExPlat) is reviewed below.

Working with the Code

The code is divided into many projects with dependencies between them. You can work with the code with in editor and build it with the accompanying ANT script. However, we recommend using Eclipse. Each directory is an Eclipse project and can be easily imported into Eclipse. To do so select Import from the File menu and choose Existing Projects into Workspace. The Root Directory should be the code directory and all projects should be selected by default. The dependencies will be imported automatically. To successfully build SPF in Eclipse you will need to set the classpath variable TINY_REPO to the code directory. To so go to Preferences -> Java -> Build Path -> Classpath Variables, add a new variable with the name TINY_REPO and a folder value that points to the repository location.

Getting to Know the Code

There are two ways to use SPF. The first is by simply calling the code in your own classes. To see how a complete experiment can be set up programmatically please see edu.uw.cs.lil.tiny.geoquery.GeoExpSimple. The main(String[]) method in this class initializes the framework, instantiates all required objects for inference, learning and testing and launches the appropriate jobs in order. A slightly more robust way to conduct experiments is provided by the ExPlat internal experiments framework. The previously mentioned example experiments are using ExPlat.

Logging in SPF

SPF’s logging system should be initialized prior to using the framework. Including the default output stream (e.g., Logger.DEFAULT_LOG = new Log(System.err);) and log threshold (e.g., LogLevel.setLogLevel(LogLevel.INFO);). Only log messages of the set threshold or above will be logged. Each log messages is prefixed by the originating class by default, to turn off this behavior use Logger.setSkipPrefix(true);. When using ExPlat, logging messages are printed to job specific files, which are stored in a special directory created for the experiment.

All classes that output log messages include a public static object called LOG of type edu.uw.cs.utils.log.ILogger. All logging messages within the class are created using the logger. This object also provides more granular control of the logging level. You may set a custom log level for each class. For example, to set the log level for the multi-threaded parser to DEBUG use edu.uw.cs.lil.tiny.parser.ccg.cky.multi.MultiCKYParser.LOG.setCustomLevel(LogLevel.DEBUG). Note that this will only affect the log messages created within this class, and not any parent or child classes.

Want to use the logging system in your own code or add new log messages? You can view the interface edu.uw.cs.utils.log.ILogger to see what logging methods are available.

ExPlat

ExPlat is SPF’s experiments platform. It’s intended to streamline experiments and help you avoid huge main(String[]) methods that just initialize one things after the other and are a pain to update.

An ExPlat experiments is defined by two parts: a backend class in the code and a .exp file. For example, consider the GeoQuery experiment accompanying SPF. The class edu.uw.cs.lil.tiny.geoquery.GeoExp provides the backend code, while the .exp files in geoquery/experiments define the resources used and the jobs to be executed. For example, the file geoquery/experiments/template/test/test.exp defines the evaluation experiment. In this experiment we use the 10 development folds for training and test on the held-out set.

Each .exp file includes three sections separated by empty lines:
1. Global parameters
2. Resources
3. Jobs
Empty lines may not appear in any part of the .exp file or any included files, except to separate between the three sections. In general, each line includes a single directive. The include=<file> directive may be used at any point to include a file. The file is included in similar style to C include statements, meaning: the text is pasted instead of the include directive. The path of the included file should be relative to the location of the .exp file. Each directive (line) is a white-space separated sequence of key-value pairs (key=value). To refer to the value of a previously defined parameters use %{parameter_name} in the value slot. The reference is resolved first in the local scope of the current directive, and if no parameter of that name is found, it is resolved in the scope of the global parameters. The parameters are read in sequence and the namespace of the current directive is updated in that order.

Global Parameters

This section includes various parameters that may be used when defining resources and jobs later. Each parameter is defined as a key-value directive: key=value. Each directive defines a single parameter. The values are read as strings. The parameters are either used in the code or by the rest of the .exp file. For example, to see which parameters are used by the GeoQuery experiment, see the parameters read by the constructor of edu.uw.cs.lil.tiny.geoquery.GeoExp and the classes it extends.

Resources

Each directive in this section defines a resource that can be used later, either in other resources or in jobs. The id parameter defines the name of the resource. Resources IDs may not override previously defined names. The type parameter defines the type of the resource. The available resources are registered in the code. For example, the class edu.uw.cs.lil.tiny.geoquery.GeoResourceRepo is responsible for registering resources for the GeoQuery experiment. To use a resource its creator is registered. A creator is basically a factory that is able to get a directive in the form of parameters, interpret them and instantiate an object. The type() of each creator provides the type of the resource. The other key-value pairs provide various parameters that are interpreted by the resource creator. See the creators themselves for the available parameters. All creators implement the interface edu.uw.cs.lil.tiny.explat.resources.IResourceObjectCreator. The resource creators each define a usage() function which documents the available parameters of the resource.

Jobs

Each directive in this section defines a job to be executed. The type of the job is defined by the type parameter. The ID of the job is defined by the id parameter. The jobs on which this job depends are defined by the dep parameter, which includes a comma-separated list of job IDs. Jobs are executed in parallel, unless a dependency is declared. The rest of the parameters define arguments for the job. These may refer to resources, global parameters or directly include values. Job directives are interpreted by methods in the experiment class. For example, in the GeoQuery experiment the class edu.uw.cs.lil.tiny.geoquery.GeoExp includes a number of methods to read job directives and create jobs, see the method createJob(Parameters) for the available job types and the methods that handle them.

Working with Logical Expressions

See LogicalLanaguageServices for the main service class that is required for the logical language to work. Most operations on logical expressions are done using the visitor design pattern, see ILogicalExpressionVisitor.

More coming soon …

Combinatory Categorial Grammars (CCGs) in SPF

Coming soon …

Basic Operations on Categories

See ICatagoeryServices.

More coming soon …

Troubleshooting

I am getting NaNs and/or infinity values in my updates or an exception saying my updates are invalid.
If you are using an exponentiated model with gradient updates, try to adjust the learning rate. With an exponentiated model the values might get too large and it’s advised to scale them down.

I am having trouble re-creating the results from Kwiatkowski et al. 2010 and Kwiatkowski et al. 2011 with the Unification-based GENLEX procedure.
The unification code in SPF is not identical to the original paper. The code for the original paper is available online. If you want to re-create the results, this is the way to go. This code is basically a very old version of what SPF started from. Be warned, it’s messy. The code in SPF is doing a few things differently, including more liberal splitting and no support of certain features that the original code contains. We hope to bring SPF’s version of splitting closer to the original paper in the future.

Publications and Projects Using SPF

Please let us know if you used SPF in your published work and we will be happy to list it here.

Tom Kwiatkowski, Eunsol Choi, Yoav Artzi and Luke Zettlemoyer. Scaling Semantic Parsers with On-the-fly Ontology Matching. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2013.

Nicholas FitzGerald, Yoav Artzi and Luke Zettlemoyer. Learning Distributions over Logical Forms for Referring Expression Generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2013. Code and data.

Yoav Artzi and Luke Zettlemoyer. Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions. In Transactions of the Association for Computational Linguistics (TACL), 2013.

Acknowledgements

We would like to thank our early users for helping making SPF better:
Eunsol Choi, The University of Washington
Sebastian Beschke, Hamburg University
Nicholas FitzGerald, The University of Washington
Tom Kwiatkowski, The University of Washington
Jun Ki Lee, Brown University
Kenton Lee, The University of Washington
Gabriel Schubiner, The University of Washington
Adrienne Wang, The University of Washington

UW SPF - The University of Washington Semantic Parsing Framework. Copyright (C) 2013 Yoav Artzi

This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
details.

You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 51
Franklin Street, Fifth Floor, Boston, MA 02110–1301, USA.