Possible to add new precomposed characters for local language in Togo?

Philippe Verdy verdy_p at wanadoo.fr
Thu Nov 3 18:53:57 CDT 2016


2016-11-04 0:24 GMT+01:00 Doug Ewell <doug at ewellic.org>:

> Philippe Verdy wrote:
>
> >> Mats is talking about the fact that a dead key combination (of any
> >> length) under Windows can generate only a single UTF-16 code unit.
> >
> > That's wrong. Windows can perfectly generate multiple code units (in
> > fact it does it for non BMP characters, including in MSKLC!) from its
> > KLC tables using the default system driver.
>
> From a dead key combination? Can you provide an example?
>
> > Only the GUI editor MSKLC cannot use this possibility and it does not
> > understand chained tables (note: you can perfectly assign another
> > table index instead of a character to the combination of a dead key
> > state and another dead key, so that you can type another key which
> > will be mapped in the combined state; the combined state can then
> > accept the space bar to force the output of the NFC form for
> > SPACE+diacritic1+diacritic2, which should be, if possible, a
> > spacing-diacritic1 followed by a combining-diacritic2, or the reverse
> > if both diacritics have a non-zero combining class but the second one
> > has a lower combining clas than the second one).
>
> Even if true -- and I doubt that the Windows keyboard engine knows
> anything about Unicode combining classes -- it doesn't solve Mats's
> problem. He doesn't want to generate the two diacritical marks in
> isolation. He could do that without dead keys.
>

Windows does not have to know that: the order will be the one you have used
in your keymap tables.

If a user types a dead key, followed by a character not listed in the
> dead key table, Windows gives up and outputs the characters associated
> with the two keys. That's not at all the same thing as what Mats wants.
>

Windows does not do that magically: for characters missing in a table, it
uses by default the position assigned to the space bar, which must be
mapped in all keymaps to generate a seuqnce for the "isolated" dead keys,
then it will reset the state to initial, and then will try to find a
mapping for that character from the table for the initial state.

>
> What Mats wants is to enter <dead key>, <dead key>, <base letter> and
> have the keyboard generate <letter with two diacritical marks>. That is
> the sequence of 3 output code units that the Windows architecture -- not
> just MSKLC -- does not support. If you disagree, please provide an
> example.


I had perfectly understood that ! And my response was in line for this need:

Pseudo-code:

Table[Initialstate] [<deadkey1>,<modifiers1>] = StateDeadKey1
Table[StateDeadKey1] [<deadkey2>,<modifiers2>] = StateDeadKey1And2
Table[StateDeadKey1And2] [<base letter>,<modifiers3>] = NFC(<base letter;
deadkey1; deadkey2>)

Each table entry can contain either a special value for a table index
(representing the current state), or a sequence of UTF-16 code units (the
number of code units depends on the table format, whose header indicates
how many code units are stored, and how many modifiers are mapped or
masked), or a null entry for unmapped keys). The maximum number of UTF-16
code units depends on the OS version which supports more formats (I think
it is now up to 6 code units in past versions it was 4, but there's an
extra format where table entries are in fact positions in a string table,
where strings have variable lengths: the string table just follows the
tables of keymaps, there's actually no code at all in most keyboard drivers
that don't need a special UI.

Newer drivers for Windows hwoever contain additional data with a geometric
layout for touch screens. Some drivers will contain code (notably for CJK
keyboards that need an UI interface for their IME, and for typing emojis,
or to use assistive technologies based on lingusitic dictionnary lookups,
such as "T9" input methods on smartphones/tablets/remote controls).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20161104/386e0a85/attachment.html>


More information about the Unicode mailing list