Pinyi : A Chinese Romanization System | pizin yi : yi zhozeng han zih la direireng huazei fang ahn

Goal

One-to-one mapping between a Chinese character and a code
Based on Pinyin system
Add additional letters with sounds like "yi" or "ee" after the vowel, so as to make less impact to spelling
Use Wubi to generate the additional letters
Chinese characters with higher frequency should have zero or less additional letters

Algorithm

1. Generate mapping from character to Wubi

Generate a unique Wubi for each character, using the data source A.
For characters with the same Wubi code, add extra string in the order as:
a;b;c;d;e;f;g;h;i;j;k;l;m;n;o;p;q;r;s;t;u;v;w;x;y;aa;ba;ca;da;ea;fa;ga;ha;ia;ja;ka;la;ma;na;oa;pa;qa;ra;sa;ta;ua;va;wa;xa;ya

2. Generate mapping from character to Pinyin

Choose a Pinyin for each character, using the data source B and data source C.
If a character is in data source C, choose the first Pinyin.
If a character is not in data source C, and has a Pinyin using tone 5, chose the one with tone 5
Otherwise, choose the first PinYin in alphabetical order

3. Generate Pinyi

Go through each characters in the order of data source B:
Define prefix the leading consonant and the vowel of Pinyin
Define suffix the trailing consonant ("n", "ng", "r"), and convert "r" to "l" (for avoid later conflicts).
Try each possible candidate code for the character, to pick the first one which has not been used before it.
The order of candidate are:
prefix + suffixPinyin without tone
prefix + encoded[wubi[0]] + suffixPinyin without tone, inserted the "yi"-code of one Wubi letter
prefix + encoded[wubi[0]] + encoded[wubi[1]] + suffixPinyin without tone, inserted the "yi"-codes of two Wubi letters
prefix + encoded[wubi[0]] + encoded[wubi[1]] + ... + suffixPinyin without tone, inserted the "yi"-codes of more Wubi letters
prefix + encoded[wubi[0]] + encoded[wubi[1]] + ... + encoded[extra] + suffixPinyin without tone, inserted the "yi"-codes of all Wubi letters, and extra string
Encode Table
gre
fri
drie
srei
ar
hje
jji
kjie
ljei
mj
tze
rzi
ezie
wzei
qz
yhe
uhi
ihie
ohei
ph
nye
byi
vyie
cyei
xy
Extra strings are:
a;b;c;d;e;f;g;h;i;j;k;l;m;n;o;p;q;r;s;t;u;v;w;x;y;aa;ba;ca;da;ea;fa;ga;ha;ia;ja;ka;la;ma;na;oa;pa;qa;ra;sa;ta;ua;va;wa;xa;ya

Data Sources

A. Wubi coding table from https://github.com/ellicefix/98wubi-unicode/blob/master/98%E8%B6%85%E9%9B%86-%E5%8D%95%E8%A1%8C%E5%A4%9A%E4%B9%89.txt (GitHub : ellicefix/98wubi-unicode forked from GitHub yanhuacuo/98wubi-unicode)
B. Frequency and Pinyin of Chinese characters from http://lingua.mtsu.edu/chinese-computing/statistics/char/list.php?Which=MO (Modern Chinese Character Frequency List by Jun Da (jda@mtsu.edu))
C. Popular proununciation of Chinese characters from http://zein.se/patrick/3000char.html (The most common Chinese characters in order of frequency, by © 2003 – 2009 Patrick Hassel Zein )