Implementing SMP on a UTF-16 OS

Marcel Schneider charupdate at
Fri Aug 14 06:14:55 CDT 2015

As far as it remained pregnant, the issue is now resolved to some extent. Only five high surrogates are used for the 2,413 SMP characters that might most probably be wished to be available on a universal Latin layout, so that five key positions (for example at Shift+Kana) are enough to ensure input efficiency along with streamlined Compose sequences for the low surrogates.

This results from examining the NamesList in a spreadsheet with surrogates generated by Excel formulas. The five mentioned leading surrogates are:

U+D800 (12 Roman symbols U+10190 sqq);
U+D835 (996 mathematical letters U+1D400 sqq);
U+D83C (421 mathematical letters, symbols, and emojis U+1F100 sqq);
U+D83D (821 emoticons and stars U+1F400 sqq);
U+D83E (163 arrows U+1F800 sqq).

However, this workaround is far from optimal and demands from the user to learn—additionally to the Compose sequences—which leading surrogate he must type first. For example, U+1F16A RAISED MC SIGN, and U+1F16B RAISED MD SIGN (which being for use in Canada, are supposed to be on every universal Latin layout of any locale) should be input with Compose, m, c, and Compose, m, d, respectively. Now the user must type Shift+Kana+S, Compose, m, c, or Shift+Kana+S, Compose, m, d. Itʼs just less bad than not to have them anyhow. (Depending on the locale, one might wish to map them to (Shift+) Ctrl+Alt+C and Ctrl+Alt+D.)

Iʼm still hoping that there will be means to make DEADTRANS rendering two code units alternatively, or to define and use a DEADTRANSEXT function.

Best regards,

Marcel Schneider

More information about the Unicode mailing list