PinHua - A Chinese Low-Conflict Romanization System
Goal
One-to-many mapping between a Chinese character and a code
Low conflict rate
Based on Pinyin system
Add two extra letters based on four leading strokes to distinguish different characters with the same pronunciation
Algorithm
1. Generate mapping from character to Pinyin
Choose a Pinyin for each character, using the data source A and data source B.
If a character is in data source B, choose the first Pinyin.
If a character is not in data source B, and has a Pinyin using tone 5, chose the one with tone 5
Otherwise, choose the first PinYin in alphabetical order
2. Generate stroke suffix
Pick the four leading strokes of each character, using data source C
Fill in zeros "0" if there is less than four characters
Convert the four strokes to two letters as table below
3. Generate PinHua
Combine PinYin and the two-letter suffix suffix to generate PinHua
Stroke-to-Letter Conversion Table
Stroke | Letter |
-- | g |
-| | f |
-/ | d |
-\ | s |
-~ | a |
|- | h |
|| | j |
|/ | k |
|\ | l |
|~ | m |
/- | t |
/| | r |
// | e |
/\ | w |
/~ | q |
\- | y |
\| | u |
\/ | i |
\\ | o |
\~ | p |
~- | n |
~| | b |
~/ | v |
~\ | c |
~~ | x |
-0 | g |
|0 | h |
/0 | t |
\0 | y |
~0 | n |
00 | z |
Data Sources
B. Popular proununciation of Chinese characters from
http://zein.se/patrick/3000char.html
(The most common Chinese characters in order of frequency, by © 2003 – 2009 Patrick Hassel Zein )