Source

pinyomi / pinyin /

Filename Size Date modified Message
..
templates
templatetags
1.4 KB
0 B
4.1 KB
7.3 KB
4.0 KB
7.4 KB
4.8 KB
8.4 KB
12.5 KB
2.6 KB
4.7 KB
1.4 KB
2.1 KB
2.5 KB
2.8 KB
2.9 KB
3.6 KB
6.6 KB
1.9 KB
900 B
9.1 KB
5.4 KB
Process for building the pinyin dictionary
==========================================

1. Harvest Japanese words with frequency counts.
   Yields:
   - word, P(word)
   Store: 
   - word, reading and meaning into database
   - word/reading index, scored by P(word)

2. Convert each word with kanji to one or more possible Chinese representation. 
   UPDATE: Actually there is only one unique Chinese representation.
   Yields: 
   - hanziString
   Store:
   - hanzi index, scored by P(word)

3. Convert each hanziString to one or more possible pinyin representations.
   Yields:
   - pinyinString and P(pinyinString|hanziString)
   Store:
   - pinyinString index

Process for building the kana dictionary
========================================

1. Harvest Chinese words with frequency counts.
   Yields:
   - word, P(word)
   Store:
   - word, reading and meaning into database
   - word/reading index, scored by probability

2. Convert each Chinese word into Japanese kanji, calculating a probability
   score for each transliteration.
   Yields:
   - kanjiString, P(kanjiString|word)
   Store:
   - kanji index, scored by P(kanjiString|word)*P(word)

3. Convert each kanjiString into possible hiragana readings, with a probability
   for each reading.
   Yields:
   - kanaString, P(kanaString|kanjiString)
   Store:
   - kana index, scored by 
     P(kanaString|kanjiString)*P(kanjiString|word)*P(word)