gpalign-py / README

Grapheme-phoneme aligner

Alignment code

This is the grapheme-phoneme alignment algorithm described in the paper

    Yencken, Lars and Baldwin, Tim
    "Efficient grapheme-phoneme alignment for Japanese"
    Proceedings of ALTW 2005, Sydney, pp. 143-151 (2005).

For a broader outline of the purpose of this software, please see the original

For all intents and purposes, this software is legacy. However, it is still
used to align the EDICT dictionary for the FOKS dictionary interface. 

Source code and bug tracking are found at:

If you are interested in using this code and have some problems, feel free to
email me ( 

Evaluation data

Evaluation data in src/data is licensed separately under the terms of the
Creative Commons Attribution 3.0 Unported licence
(, with attribution via citation
of the following paper.

    Baldwin, Timothy and Hozumi Tanaka (1999) The Applications of Unsupervised
    Learning to Japanese Grapheme-Phoneme Alignment, In Proceedings of ACL
    Workshop on Unsupervised Learning in Natural Language Processing, College
    Park, USA, pp. 9-16.

The paper can be found in the PDF at:

The full data set is available at: