Possible to add new precomposed characters for local language in Togo?

Marcel Schneider charupdate at orange.fr
Tue Feb 16 08:01:13 CST 2016


On 2/15/2016 3:32 PM, Mats Blakstad wrote:
[…]
> I now wonder, generally, is it best to add new precomposed characters to Unicode? Should there be a unicode symbol for each combination used? What is best practise? I ask because I see some unicodes are precomposed characters, I'm not sure why they are useful, but if they are maybe we also should add these?
[…]

On Mon, 15 Feb 2016 20:46:28 -0800, Asmus Freytag (t)  answered :
[…]
> However, precomposing these is simply out. Unicode locked that door and threw away the key (short answer). The long answer will come along shortly.

Existing precomposed characters have been proposed before the deadline, i.e. in the past millennium, and encoded for backwards compatibility. Therefore, the scripts of many Latin-writing countries, including Vietnam, can be represented both in NFD *and* NFC, but this is purely fortuitous. The well-known Unicode encoding scheme being based on _combining diacritics_, a part of implementation consists in making these supported at all stages of data processing, including input.

The big oopsie that you stumbled upon, is that Windows keyboard layout drivers―as opposed to Linux―cannot generate by dead keys more than one single UTF-16 code unit. Supposedly this is due to a gap in keyboard standardization. When ISO/IEC 9995 was published in 1994, after a decade of work―and after a couple of years thriving Unicode―the standard provided nothing to cater for Unicode implementation. A bit later, the Windows keyboard APIs were frozen, for backwards compatibility.

Indeed there _is_ a problem. But there are solutions.

On Tue, 16 Feb 2016 09:00:26 +0100, Philippe Verdy  answered :
[…]
> Keyboard layouts MUST generate the combining sequence.
[…]

Indeed Unicode states that «it is straightforward to adapt such a system» of dead keys to output combining sequences as well, and that was the idea when ISO/IEC 9995-11 was added past year. That last and most recent part of the standard specifies the algorithm of an IME that uses the NormalizeString function or the String Normalize method provided by the OS. You may wish to look up the long description in French Wikipédia [1].

On Windows there is however no need of a *new* and ISO/IEC-conformant IME, as Keyman keyboard layouts are already able to generate whatever sequence is required, from whatever input is specified, with dead keys or visible on screen. If you checked the Pan Africa (Deadkeys) layout that is suitable for Togo and many other African countries, as well as the official SIL Pan Africa keyboard, and they don’t match your requirements―because diacritics are entered _after_ the base letter, even to get existing precomposed letters output―you may wish to create a layout that outputs combining sequences entered by dead keys, using Keyman Developer.

Experience shows however that training on dead key layouts as used for French, can be extended to the use of combining diacritics entered after the base letter, with an appropriate keyboard layout driver. These combining characters being actually the most useful form of most diacritics, it is recommended that they be generated when the space bar is hit after a dead key if such are present. More obviously all needed diacritics are allocated to key positions, so that they can be added to any letter by the means of a single keystroke. One example is the keyboard layout for Bamanankan and French on the /Mali Pense/ site that Don Osbornʼs /Beyond Niamey/ blog linkes to [2]. Anyway, entering diacritics _after_ the base letter is the most up-to-date way to input composed characters, because it is very intuitive, and because it realizes the spirit of the character representation scheme of Unicode.

I hope that helps too.

Best regards,

Marcel

[1] https://fr.wikipedia.org/wiki/ISO/CEI_9995#ISO.2FCEI_9995-11_-_Les_touches_mortes

[2] Don Osborn. Beyond Niamey: Writing Bambara right. (2014, November 25). Retrieved October 22, 2015, from http://niamey.blogspot.fr/2014/11/writing-bambara-right.html



More information about the Unicode mailing list