Go through all English words in data source A, and count the number of occurence for each English word
3. Generate PinYing
Go through each characters in the order of data source A:
For each English word in the meaning column of one Chinese character, calculate the following values
Define In-Quote as whether the word appears in a quotation
Define Upper-Case as whether the word is using upper case such as names of person or countries
Define score-Value = order of the word + (length of the English word * word occurrence in data source A)
Sort the words in the order of In-Quote, Upper-Case, Score-Value, and the word string
Pick the first code which has not been used before, wherein code is Pinyin without tone + dot + English word
If all words have been used, try use extra meaningless letters:
a;b;c;d;e;f;g;h;i;j;k;l;m;n;o;p;q;r;s;t;u;v;w;x;y;aa;ba;ca;da;ea;fa;ga;ha;ia;ja;ka;la;ma;na;oa;pa;qa;ra;sa;ta;ua;va;wa;xa;ya;ae;be;ce;de;ee;fe;ge;he;ie;je;ke;le;me;ne;oe;pe;qe;re;se;te;ue;ve;we;xe;ye
Data Sources
B. Popular proununciation of Chinese characters from
http://zein.se/patrick/3000char.html
(The most common Chinese characters in order of frequency, by © 2003 – 2009 Patrick Hassel Zein )