Possible to add new precomposed characters for local language in Togo?

Marcel Schneider charupdate at orange.fr
Sun Nov 6 12:33:39 CST 2016


To complete this thread prior to Microsoftʼs response, Iʼd quote in extenso 
the relevant part of the Standard. Though it basically matches the actual state 
of discussion, quoting it here seems useful since it highlights the fact that 
if the end-users are used to dead keys—as in the francophone regions in Africa—
urging them to swap base characters and diacritics is *not* straightforward. 
Keyboard layouts without dead keys and with combining diacritics on live keys 
are thus to be promoted in the anglophone regions of Africa, not the francophone 
ones where layouts with string-generating dead keys seem to be mandatory:

TUS 9.0, §5.12 (Implementation Guidelines: Strategies for Handling Nonspacing Marks), p. 222:
|
| Keyboard Input
|
| A common implementation for the input of combining character sequences is the use of
| dead keys. These keys match the mechanics used by typewriters to generate such sequences
| through overtyping the base character after the nonspacing mark. In computer implementations,
| keyboards enter a special state when a dead key is pressed for the accent and emit a
| precomposed character only when one of a limited number of “legal” base characters is
| entered. It is straightforward to adapt such a system to emit combining character sequences
| or precomposed characters as needed.
|
| Typists, especially in the Latin script, are trained on systems that work using dead keys.
| However, many scripts in the Unicode Standard (including the Latin script) may be implemented
| according to the handwriting sequence, in which users type the base character first,
| followed by the accents or other nonspacing marks (see Figure 5-4).
|

In another part, TUS mentions the downside (outdated legacy keyboard protocols):

TUS 9.0, §2.7 (General Structure: Unicode strings), p. 43:
|
| […] While an ideal protocol would allow keyboard events to contain complete strings,
| many allow only a single UTF-16 code unit per event. […]


BTW there is an obvious error in my last e-mail (quoted below):
> […] that the developers were asked to ape the *then new* ISO/IEC 9995-3 group 
> selector. by implementing it as a dead key, as a *remnant* group selector.
This is not about a dead key, but about a modifier key.

Then there is a flaw when I didnʼt mention that I pressed the related modifier:
> So I added 0x80 to the attribute of a key, expecting that this would 
> make it sensitive to the CapsLock toggle key VK_CAPITAL, because this 
> would match the ISO/IEC 9995 intent of having a secondary group that is 
> subject to CapsLock. But it did not work.
Should read:
“expecting that this would make it sensitive to CapsLock *on the 0x80 shift state*.”

Lastly, subscribers who had trouble downloading the folder from dispoclavier.com 
are welcome to e-mail me off-list so that I can send sources and/or drivers without 
the script that is available at charupdate.info#drivers (translation will complete). 
Although Iʼm aware that developers using KbdUTool typically scripted already the 
automation of the process.

Marcel
 
On 06/11/16 08:28, I wrote:
> On Sun, 6 Nov 2016 05:40:59 +0100, Philippe Verdy wrote:
> 
> > Another use case: being able to type Bopomofo along with Cyrillic or 
> > Kanas...; and new extensions will be needed for the 2012 German layout and 
> > other layouts made according to the ISO standard (you cannot do all what 
> > you want with just a few modifier bits and Windows only implementing a Kana 
> > modifier key and limiting the number of modifiers supported even below the 
> > capacity of the WORD ModificationNumber ! 
> 
> This does not match my experience. Iʼm actually using modifiers 0x10, 0x20, 
> 0x40 and 0x80 too, and kbd.h has even names for most of them: [kbd.h(51)]
> 
> /*
> * Keyboard Shift State defines. These correspond to the bit mask defined
> * by the VkKeyScan() API.
> */
> #define KBDBASE 0
> #define KBDSHIFT 1
> #define KBDCTRL 2
> #define KBDALT 4
> // three symbols KANA, ROYA, LOYA are for FE
> #define KBDKANA 8
> #define KBDROYA 0x10
> #define KBDLOYA 0x20
> #define KBDGRPSELTAP 0x80
> 
> 0x40 proves to be useable too. What I cannot understand, and others 
> are puzzled too, is the name KBDGRPSELTAP. It sounds like it were an 
> acronym of “GRouP SELecTor APing” or the like, hence my suspicion that 
> the developers were asked to ape the *then new* ISO/IEC 9995-3 group 
> selector. by implementing it as a dead key, as a *remnant* group selector.
> 
> Thatʼs about the name only. Much more annoying is that Iʼve been unable 
> to get any result from the application of the related attribute: [kbd.h(364)]
> 
> #define CAPLOK 0x01
> #define SGCAPS 0x02
> #define CAPLOKALTGR 0x04
> // KANALOK is for FE
> #define KANALOK 0x08
> #define GRPSELTAP 0x80
> 
> And there is even NO COMMENT, as only the first two are mentioned in the 
> preceding comment: [kbd.h(46)]
> 
> * Special values for Attributes:
> * CAPLOK - The CAPS-LOCK key affects this key like SHIFT
> * SGCAPS - CapsLock uppercases the unshifted char (Swiss-German)
> 
> So I added 0x80 to the attribute of a key, expecting that this would 
> make it sensitive to the CapsLock toggle key VK_CAPITAL, because this 
> would match the ISO/IEC 9995 intent of having a secondary group that is 
> subject to CapsLock. But it did not work.
> 
> Thank you for the instructions below. I hope that the programmers on 
> this List know how exactly it must be translated into C so that it will 
> be compiled and the API can read the compiled binaries it, and that 
> Microsoft will make and ship the kernel-level update you mention below
> with one of the very next Windows Updates so that all users whose 
> Windows version stays maintained, will be able to use keyboard layouts 
> that can input WCHAR strings trough dead keys.
> 
> Best regards,
> 
> Marcel
> 
> On Sun, 6 Nov 2016 05:37:12 +0100, Philippe Verdy wrote:
> 
> > Note: such extension is absolutely necessary for scripts not encoded in 
> > the BMP (e.g. Gothic or Deseret, or larger scripts that will absolutely 
> > need mechanisms like dead keys if they want to have a usable keyboard 
> > layout !) 
> > 
> > 2016-11-06 5:32 GMT+01:00 Philippe Verdy : 
> > 
> >> 
> >> 
> >> 2016-11-06 4:11 GMT+01:00 Marcel Schneider : 
> >> 
> >>> On Fri, 04 Nov 2016 15:30:48 -0700, Doug Ewell wrote: 
> >>> 
> >>> — And with LATIN CAPITAL LETTER OPEN E? Why not this way (as has been 
> >>> suggested): 
> >>> /*TILDE&AIGU */ DEADTRANS( 0x0190 ,0x1e4d ,{0x0190,0x0303,0x0301} ,DKF_0 
> >>> ), // *LATIN CAPITAL LETTER OPEN E WITH 
> >>> TILDE AND ACUTE 
> >>> 
> >> 
> >> This snippet cannot work as is, because the DEADTRANS() macro maps 
> >> gernerates a 8-BYTE structure only has a single WCHAR for storing the 
> >> result of the map of a (VKEY+modifier number): 
> >> 
> >> typedef struct _DEADKEY { 
> >> DWORD dwBoth; 
> >> WCHAR wchComposed; 
> >> USHORT uFlags; 
> >> } DEADKEY, *PDEADKEY; 
> >> 
> >> So it will need to map a WCH_LGTR instead, and then use a "ligature" 
> >> table to store the string containing the 3 code units you want. 
> >> 
> >> Then there's an unused BYTE in the DEADTRANS structure for the flags, 
> >> that can be used (specifically for entries mapped to WCH_LGTR) to pass 
> >> flags to the LIGATURE(n) table (where there's also a free BYTE in the 
> >> indexing key, allowing to pass an identifier needed for the lookup in the 
> >> LIGATURE(n) table; alternatively, instead of mapping WCH_LGTR (a PUA), you 
> >> could as well map another PUA there in 0xE001.0xE0FF for passing a byte for 
> >> the deadkey state into the lookup of ligatures: 
> >> 
> >> #define TYPEDEF_LIGATURE(i) \ 
> >> typedef struct _LIGATURE ## i { \ 
> >> BYTE VirtualKey; \ 
> >> WORD ModificationNumber; \ 
> >> WCHAR wch[i]; \ 
> >> } LIGATURE ## i, *PLIGATURE ## i; 
> >> 
> >> which can safely be changed to: 
> >> 
> >> typedef struct _LIGATURE ## i { \ 
> >> BYTE VirtualKey, DeadKeyState; \ 
> >> WORD ModificationNumber; \ 
> >> WCHAR wch[i]; \ 
> >> } LIGATURE ## i, *PLIGATURE ## i; 
> >> 
> >> (in the current definition of the extra byte is implicit for the 
> >> alignment, but not declared explicitly, it is implicitly filled with zeroes 
> >> by C compilers when declaring the structure, but in my opinion this extra 
> >> byte should have been declared explicitly.) 
> >> 
> >> But now it's up to the OS to support it, may be it works already if the 
> >> lookup in the LIGATURE(n) table already scans for values of a DWORD, 
> >> including this free padding byte, however there's a need to change some 
> >> code in the kernel-level to check the PUA values mapped in DEADKEY 
> >> structures and extract a DeadKeyState from it. 
> >> 
> >> The alternative is to map the combination of two deadkeys to a bit in the 
> >> modifier number (this can be instructed by the uFlags, which will set the 
> >> modifier bit number specified in the mapped PUA). In all cases there's 
> >> still space for extension there. 
> >> 
> >> The last alternative is to extend the KBDTABLES structure to append new 
> >> members for a table of extended DEADKEYS, and a separate table of LIGATURE 
> >> for DEADKEYs (the KBDTABLE does not specify its own size, but it has a 
> >> fLocaleFlags field just before the table of ligatures, which can indicate 
> >> the presence of these extensions. 
> >>
> 
>



More information about the Unicode mailing list