PinHua - A Chinese Low-Conflict Romanization System

Goal

One-to-many mapping between a Chinese character and a code
Low conflict rate
Based on Pinyin system
Add two extra letters based on the two leading strokes to reduce conflict rate

Algorithm

1. Generate mapping from character to Pinyin

Choose a Pinyin for each character, using the data source A and data source B.
If a character is in data source B, choose the first Pinyin.
If a character is not in data source B, and has a Pinyin using tone 5, chose the one with tone 5
Otherwise, choose the first PinYin in alphabetical order
Use "yu" to represent "ü"
Add "e" for PinYin without vowel: "ng" -> "eng", "hng" -> "heng"

2. Generate stroke letters

Pick the two leading strokes of each character, using data source C
Fill in horizontal stroke "-" if there is less than two strokes
Convert the two strokes to letters as table below

3. Generate PinHua

Insert the two stroke-based letter after the first PinYin letter
Append tone suffix as table below

Stroke-to-Letter Conversion Table

StrokeFirst VowelLetter
- i u e
- a e o o
| i u i
| a e o u
/ i u y
/ a e o w
\ a e i o u h
~ i u r
~ a e o v

Tone-to-Letter Conversion Table

ToneLetter
1 -
2 / z
3 v v
4 \ s
5 . x

Data Sources

A. Frequency and Pinyin of Chinese characters from http://lingua.mtsu.edu/chinese-computing/statistics/char/list.php?Which=MO (Modern Chinese Character Frequency List by Jun Da (jda@mtsu.edu))
B. Popular proununciation of Chinese characters from http://zein.se/patrick/3000char.html (The most common Chinese characters in order of frequency, by © 2003 – 2009 Patrick Hassel Zein )
C. Stroke of Chinese characters from https://github.com/YQ-YSY/stroke-seq_MB/blob/master/text/%E5%8D%95%E5%AD%97_%E7%AC%94%E9%A1%BA%E7%A0%81_29685%E4%B8%AA.txt (From https://github.com/YQ-YSY/stroke-seq_MB)