Unicode organization is still anti-Serbian and anti-Macedonian

Philippe Verdy verdy_p at wanadoo.fr
Sun Feb 16 11:44:55 CST 2014

2014-02-15 19:25 GMT+01:00 Richard Wordingham <
richard.wordingham at ntlworld.com>:

> On Fri, 14 Feb 2014 02:37:19 -0800
> Крушевљанин <Perka at muchomail.com> wrote:Should these combinations be well
> known?  They're not listed in the
> CLDR exemplar characters for Serbian.
> As for input, I would suggest that the solution for the simpler
> keyboarding techniques is to enter them as base character and then dead
> key.
"Dead keys" don't work this way. Their name really indicate that these keys
have no action (seam dead) until another key is pressed AFTER them.

So you press the dead key for the diacritic, then the key for the base
letter, to produce EITHER:
- a single precomposed character (where it exists) ; OR
- a canonically equivalent decomposed combiing sequence representing the
letter with its diacritic(s) (preferably in NFC form).

Dead keys may be combined in advanced keyboard drivers supporting complex
input states for handling multiple diacritics typed before a base letter ;
but simple keyboard drivers (such as those generated by MS Keyboard Layout
Editor) do not handle these complex states. But nothing prohibits building
such a keboard driver.

There's another inut method where you can press a key for the diacritic
after a base letter: this key is treated in isolation and immediately
generates the combining diacritic, independantly of the characters pressed
before. But such input method will not warranty the NFC form, and cand
produce broken sequences (in some cases the diacritic may be invisible in
the generated text).

For simple alphabetic scripts (like Latin, Greek, Cyrillic), the dead key
input method is generally prefered. the other one is used to enter isolated
combining diacritics which are almost never used in association with other
letters (and notably not in combining sequences equivalent to an existing
precomposed letter).

If you think about the combining diaeresis, as it is already used very
frequently in association with Latin and Cyrillic letters using a dead key
method, it should also be used as a dead key even for less frequent base
letters such as the Cyrillic letter Q. All that is needed is to use an
updated driver adding the mapping for diacritic dead key+letter, in which
it will output the NFC combining sequence if there's no precomposed NFC


Unfortunately, the drivers generated by the MS Keyboard Layout Creator
(MSKLC), when it does not find any explciitly predefined mapping for
diacritic dead key+base letter, will generate the mapping for <diacritic
dead key+SPACE>, followed by the base letter, meaning that you won't get
the text <base letter, combining diacritic>, but <spacing modifier letter
for the diacritic, base letter> !

The second limitation of MSKLC is that it cannot chain dead letters: each
input state must be mapped to a single state represented by a single
character, which is the spacing modifier letter that would be output if you
press the SPACE bar after the diacritic. It incorrectly assumes that
combinations that are not mapped explicitly will always be used followed by
a space bar keystroke to produce a spacing modifier letter, as if all
unmapped sequences were not possible and do not exist in the real world.

The other limitation is that this input state table can only be represented
by a single character in the BMP (but it may be represented by a PUA of the
BMP, even if MSKLC warns that this character may not be supported by fonts
on the native OS or in the Console using the local legacy OEM or "ANSI"
codepage (an 8-bit code page which may be either SBCS or DBCS).

Drivers built by MSKLC do not allow mapping a dead key outside the root
state table (so after pressing a dead key, possibly in combination with
state modier keys like Shift; Ctrl, Alt, and with the current state of the
CapsLock/ShiftLock), you can only press a single base character (also
possibly in combinjation with state modifier keys).

Due to these limitations of MSKLC, trying to generate some advanced keymaps
to support extended sets of combining sequences, requires using complex key
combinations with state modifiers (for the dead key and for the base
letter), which are very uneasy to input when it would be simpler and faster
to enter if sequences of dead keys were supported.

Dead keys are not very complex, in fact they are quire friendly and have
the advantage of normalizing the input to NFC directly, without needing any
additional support from the external text editor (modifying the text buffer
on the flow). They are natural to users even if the input order of
keystrokes is reversed, compared to the Unicode encoding of the generated
text (something that most users will never see as they have no idea about
how the text will be finally encoded and used in their applications).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140216/8677c783/attachment.html>

More information about the Unicode mailing list