The Alto parser
Welcome to Alto, the Algebraic Language Toolkit.
Alto is a parser and decoder for Interpreted Regular Tree Grammars (IRTGs). It is being developed at the University of Potsdam in the Theoretical Computational Linguistics group, led by Alexander Koller. Its main features are:
- Represents grammars from a wide variety of popular grammar formalisms as IRTGs, including:
- Context-free grammars
- Tree-adjoining grammars (TAG)
- Tree automata and bottom-up tree transducers
- Synchronous context-free grammars, TAG, etc.
- Tree-to-string and string-to-tree transducers
- Synchronous Hyperedge Replacement Grammars (HRG): Alto is the fastest published HRG parser in the world
- and many more
- Implements chart-based algorithms for
- synchronous parsing (with inputs from multiple sides of a synchronous grammar)
- decoding (to another side of a synchronous grammar)
- computing 1-best (Viterbi) and k-best derivations
- maximum likelihood and expectation maximization (EM) training
- Supports PCFG-style and log-linear probability models for all of these grammar formalisms.
- Built for easy extensibility: implement your own grammar formalism by adding an Algebra class, and use any of the Alto algorithms directly.
- Comes with a GUI that provides access to most of these algorithms and visualizes parsing results.
You can find the JavaDoc API documentation here.
Here are some screenshots of the Alto GUI. Here's an IRTG with one string and one graph interpretation (equivalent to a synchronous HRG):
Here's the result of parsing "the boy wants to go" with this grammar:
Version 2.1, April 2017
- Improved intersection and invhom algorithms (condensed, sibling-finder) for much faster PCFG, TAG, and HRG parsing (Groschwitz et al., ACL 2016).
- Added pruning techniques, including beam search and coarse-to-fine parsing.
- Added adaptive importance sampler for grammar induction (Teichmann et al., ACL 2016 Workshop on Statistical NLP and Weighted Automata).
- Added "inside" binarization strategy (Klein & Manning 2003).
- Added command-line scripts for parsing and grammar/corpus conversion.
- Initial support for running reproducible experiments using Alto Lab.
- Many small bugfixes and performance improvements.
Version 2.0, July 2015
- Initial Bitbucket release.