Clone wiki

gpalign-py / Home

Unsupervised grapheme-phoneme aligner


This codebase describes a grapheme-phoneme alignment system for Japanese based on the paper:

Yencken, Lars and Baldwin, Timothy: "Efficient grapheme-phoneme alignment for Japanese", in Proceedings of ALTW 2005, Sydney, Australia, pp. 143-151 (2005)

It's currently designed to align Japanese input pairs in two main formats, simple and EDICT. Since it is unsupervised, it needs either a large input dataset or previously accumulated counts from such a set in order to achieve high alignment accuracy.

More soon... until then, please contact me or read the original paper, available at my homepage.