Possible to add new precomposed characters for local language in Togo?

Philippe Verdy verdy_p at wanadoo.fr
Sat Nov 5 15:52:17 CDT 2016


2016-11-05 17:51 GMT+01:00 Marcel Schneider <charupdate at orange.fr>:

> Sorry not to have found time sooner to look close at the stuff
> that is claimed to support code unit sequences through dead keys.
> Itʼs all about live keys, none about dead keys.
> Yet another case of talking past each other.
>
> IMHO that happened because one simple question was not answered prior to
> sharing links to sources: How will the API know what line of aLigature
> (the ligature table) to look up, if the 0xf002 alias WCH_LGTR is not found
> in aVkToWch<n> (the allocation table)?
>
> Indeed, column 1 of the ligature table contains the virtual key, and
> column 2 contains the modification number, that refers to the column of
> the allocation table where each 0xf002 or WCH_LGTR is mapped to a key and
> shift state:
>
> static ALLOC_SECTION_LDATA VK_TO_WCHARS38 aVkToWch38[] = {
> // Modification_# >>>|0|1|2|3|4|5|6|7|8|9|10|11|
> 12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|
> 32|33|34|35|36|37|
> {'Q'/*T1E
> C01*/,0x01,'q','Q','#',0x2126,0x00f7,LGTR,0x0331,NONE,NONE,
> NONE,NONE,NONE,0x0634,'\\',0x0447,0x0427,0x0447,0x0427,'&
> ','%',0x03c2,0x2211,'&'
> ,'%',0x05e7,'*','&','%',LGTR,LGTR,LGTR,LGTR,LGTR,LGTR,LGTR,LGTR,NONE,NONE},
> //
> {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
> 0,0,0,0,0,0,0,0,0,0,0}
> };
>
> static ALLOC_SECTION_LDATA LIGATURE16 aLigature[] = {
> // |Virtual_Key|SC|ISO_#|Modif#|Char0|Char1|Char2|Char3|Char4|
> Char5|Char6|Char7|Char8|Char9|Char10|Char11|Char12|Char13|Char14|Char15|
> {'Q'/*T1E C01*/,5,' ',0x2191,'q','_','n',0x2019,'
> e','x','i','s','t','e','_','p','a','s'}, // ^q doesn't exist
> {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}
> };
>

Your structures do not seem to be correctly formatted (or it is just random
data):
>>> typedef struct _LIGATURE ## i { \
>>>         BYTE VirtualKey; \
>>>         WORD ModificationNumber; \
>>>         WCHAR wch[i]; \
>>> } LIGATURE ## i, *PLIGATURE ## i;
Here you set:
 VirtualKey='Q''/*T1E C01*/,
 ModificationNumber = 5,
 wch[0]=' ', wch[1]='0x2191',//^, wch[2]='q',... wch[15]='s'
for defininig a very long "ligature" (the wrong term used in kbd.h where it
should just be a "string", even if those strings have a fixed length and
are null-padded)

What it says is that VK_Q in modification number 5 (as defined in the
MODIFIERS modifier_bits table, which remaps to modication number bits the
set of virtual modifiers mapped in the VK_TO_BIT modifier_keys table)
should generate your string (which can contain up to 16 WCHARS, but no null
chars: it is not possible to include a NULL char in a LIGATURE table, but
anyway a keyboard never has to do that, as NULL chars are not mapped in any
ligature table but isolatedly in a VK mapping table, with a single WCHAR
code unit directly.

Note also that the definition in <kbd.h> of the SDK:
typedef struct _KBDTABLES {
  PMODIFIERS pCharModifiers;
  PVK_TO_WCHAR_TABLE pVkToWcharTable;
  PDEADKEY pDeadKey;
  VSC_LPWSTR *pKeyNames;
  VSC_LPWSTR *pKeyNamesExt;
  LPWSTR *pKeyNamesDead;
  USHORT *pusVSCtoVK;
  BYTE bMaxVSCtoVK;
  PVSC_VK pVSCtoVK_E0;
  PVSC_VK pVSCtoVK_E1;
  DWORD fLocaleFlags;
  BYTE nLgMaxd;
  BYTE cbLgEntry;
  PLIGATURE1 pLigature;
} KBDTABLES, *PKBDTABLES;

May be misleading, for the last two members:
- nLgMaxd indicates the maximum length of null-padded strings in a
pLigature table entry, whose entry size is stored in cbLgEntry: this size
acts as a versioning info for the ligatures table format, and most probably
it is there so that keyboard drivers compiled on another architecture will
still be usable even if the size of a WCHAR is changed.
- but of course the type of an entry is not a LIGATURE1, but at least a
LIGATURE2 (LIGATURE1 has no use in any table, given that 1-WCHAR strings
will be stored directly in one of the VK_TO_WCHAR_TABLE
<https://doxygen.reactos.org/d1/df3/struct__VK__TO__WCHAR__TABLE.html> tables.
the LIGATURE1 is just there to allow pointer typecasts in C/C++
independantly of the LIGARTURE(n) table format you need.
- Windows provably works with LIGATURE2, LIGATURE3, LIGATURE4 and LIGATURE5
(I've never tested if it works for longer strings or if it really works
with a LIGATURE1 table format)

The LIGATURE(n) format also uses internal paddings between members, notably
between "BYTE VirtualKey;" and "WORD ModificationNumber;": there's a hidden
alignment BYTE between them, which could be considered as additional flags
for the effective LIGATURE(n) format (C/C++ compilers are stupposed to fill
these padding bytes with zeroes). Given that WORD and WCHAR have the same
16-bit size, the whole structure is an array of 16-bit blocks: in a
LIGATURE1 there are two WORDS, so it is also aligned on a DWORD; in a
LIGATURE2, this would take 3 useful words, but due to alignment constaints,
the entry will be 4 words and sizeof(wch[0] will be 16, just like for a
LIGATURE3; so LIGATURE2 has no use: therere will be an extra padding null
WORD in the wchar array, and that's why "cbLgEntry " is there, but this
makes "nLgMaxd" completely unneeded, except to make sure that the extra
padding WCHAR in wch[] will be discarded, even if it is not filled with
zeroes, i.e. a NULL WCHAR which is ignored anyway and acts as an early
terminator.

Now comes the question about how ligatures are matched: they are looked up
in the LIGATURE(n) tables by looking only at the first two members
VirtualKey and ModificationNumber (ignoring the extra padding BYTE?) but
most probably by grouping them as a single DWORD (the LO WORD contains the
VKEY, the HIWORD contains the modifiers). The lookup is apparently linear
(there's apparently no requirement for this table to be sorted to perform a
binary search, and anyway these LIGATURE tables are generally short).

If a [KEY,modifiers] pair is not found in the ligature table (even if the
VK_TO_WCHAR_TABLE says it should be there by assigning a WCH_LGTR value to
the entry for that VKEY in the modifier column number), the behavior should
probably be the same as if the entry in the VK_TO_WCHAR_TABLE  contained
WCH_NONE (i.e. key not mapped), but in my opinion the table data has a bug:
it should contain WCH_NONE instead of WCH_LGTR. I think that the Keyboard
compiler tool should detect this error (it should also detect the use of an
unneeded LIGATURE1 instead of mapping directly in a VK_WCHAR_TABLE (or in a
DEADKEY table)

---- Speculation follows about possible extensions for dead keys mapped to
"ligatures", and arbitrary-length ligatures in general mapped from
DEADKEY(n) and VK_WCHAR_TABLE(n) tables ---

Note also the presence of a "flags" BYTE in entries of a DEADKEY table:
could this BYTE be used as well in the LIGATURE table entries (between BYTE
VirtualKey; WORD ModificationNumber) when the "comp" member of a DEADKEY's
entry contains a "WCH_LGTR" and use for example to store an identifier of
the deakey state for lokup in LIGATURE(n) tables (this lookup will still
continue to work by grouping <VirtualKey, DeadKeyState, modifiers> in a
single DWORD instead of comparing them individually.

Also the "nLgMaxd" member of KBDTABLES has no real use if it just contains
2, 3, 4 or 5. Setting its value to 0 would be better used to indicate that
a LIGATURE(0) entry no longer contains a null-padded string "WCHAR wch[]",
but instead contain a pointer to a real string with "PWSTR pwch;"
("cbLgEntry" is still used: on 32-bit architecture it returns 8 (2 BYTES+1
WORD for the composite key, 1 DWORD for the target pointer), on 64-bit
architecture it will return h16 (2 BYTES+1 DWORD for the composite key, 1
DWORD of alignement, 1 QWORD for the 64 bit pointer); the alternative would
be to store even shorter pointers using a single DWORD of offset in a
null-terminated strings table, stored just at end of the LIGATURE(0) lookup
table, these offsets being relative to the start of the LIGATURE(0) table
(whose pointer just has to be typecasted as a WORD[] array).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161105/8af17ae4/attachment.html>


More information about the Unicode mailing list