Pinyi : A Chinese Romanization System | PIN.spell YING.brave : YI.single ZHONG.breed HAN.dynasty ZI.symbol LA.drag DING.stem HUA.ize FANG.quadrilateral AN.incident

Goal

One-to-one mapping between a Chinese character and a code
Based on Pinyin system
Add English word to distinguish different characters with the same pronunciation

Algorithm

1. Generate mapping from character to Pinyin

Choose a Pinyin for each character, using the data source A and data source B.
If a character is in data source B, choose the first Pinyin.
If a character is not in data source B, and has a Pinyin using tone 5, chose the one with tone 5
Otherwise, choose the first PinYin in alphabetical order

2. Generate the English word frequency

Go through all English words in data source A, and count the number of occurence for each English word

3. Generate PinYing

Go through each characters in the order of data source A:
For each English word in the meaning column of one Chinese character, calculate the following values
Define In-Quote as whether the word appears in a quotation
Define Upper-Case as whether the word is using upper case such as names of person or countries
Define score-Value = order of the word + (length of the English word * word occurrence in data source A)
Sort the words in the order of In-Quote, Upper-Case, Score-Value, and the word string
Pick the first code which has not been used before, wherein code is Pinyin without tone + dot + English word
If all words have been used, try use extra meaningless letters:
a;b;c;d;e;f;g;h;i;j;k;l;m;n;o;p;q;r;s;t;u;v;w;x;y;aa;ba;ca;da;ea;fa;ga;ha;ia;ja;ka;la;ma;na;oa;pa;qa;ra;sa;ta;ua;va;wa;xa;ya;ae;be;ce;de;ee;fe;ge;he;ie;je;ke;le;me;ne;oe;pe;qe;re;se;te;ue;ve;we;xe;ye

Data Sources

A. Frequency and Pinyin of Chinese characters from http://lingua.mtsu.edu/chinese-computing/statistics/char/list.php?Which=MO (Modern Chinese Character Frequency List by Jun Da (jda@mtsu.edu))
B. Popular proununciation of Chinese characters from http://zein.se/patrick/3000char.html (The most common Chinese characters in order of frequency, by © 2003 – 2009 Patrick Hassel Zein )