Grapheme-phoneme aligner

Alignment code

This is the grapheme-phoneme alignment algorithm described in the paper

Yencken, Lars and Baldwin, Tim "Efficient grapheme-phoneme alignment for Japanese" Proceedings of ALTW 2005, Sydney, pp. 143-151 (2005). (Available:

For a broader outline of the purpose of this software, please see the original paper.

For all intents and purposes, this software is legacy. However, it is still used to align the EDICT dictionary for the FOKS dictionary interface.

Source code and bug tracking are found at:

If you are interested in using this code and have some problems, feel free to email me (

Evaluation data

Evaluation data in src/data is licensed separately under the terms of the Creative Commons Attribution 3.0 Unported licence (, with attribution via citation of the following paper.

Baldwin, Timothy and Hozumi Tanaka (1999) The Applications of Unsupervised Learning to Japanese Grapheme-Phoneme Alignment, In Proceedings of ACL Workshop on Unsupervised Learning in Natural Language Processing, College Park, USA, pp. 9-16.

The paper can be found in the PDF at:
The full data set is available at: