Implementing SMP on a UTF-16 OS

Richard Wordingham richard.wordingham at ntlworld.com
Tue Aug 11 16:06:51 CDT 2015


On Tue, 11 Aug 2015 21:27:27 +0200 (CEST)
Marcel Schneider <charupdate at orange.fr> wrote:

> Iʼve tried to just remove the parentheses and let the string. This
> was compiled, but the keyboard test showed that in the keyboard
> driver DLL, UTF-16 strings with SMP characters arenʼt handled as
> such. Each surrogate code unit is considered as a single character
> even when itʼs followed by a trailing one. Only the code unit
> corresponding to the shift state (modification number) is taken, no
> matter if itʼs only a surrogate and the other half comes next.

This is exactly what one should expect.  The data is an array of
UTF-16 code units rather than a UTF-16 string.  Moreover, it was
probably written as UCS-2.  I believe it is the application that has
the job of stitching the surrogate pairs together.

> Is this the reason why a Unicode character cannot be represented
> alternatively as a 32 bit integer on Windows?

They are, from time to time.  There's a Windows message that delivers a
supplementary character rather a UTF-16 code unit, and fairly obviously
they have to be handled as such when performing font lookups.  I've a
suspicion that this message hit an interoperability problem.  A program
that can handle pairs of surrogates but predates the message will not
work with the more recent message.  Therefore using the message type is
deferred until applications can handle it.  Therefore applications don't
need to handle it, and don't.  Therefore the message type doesn't get
used.

> Being UTF-16, the OS
> could handle a complete surrogates pair in one single 32 bit integer.
> Couldn't this be performed on driver level by modifying a program and
> updating this when the driver is installed?

You really talking about a parallel set of routines.  I suspect the
answer is that Microsoft don't want to work on extending a primitive
keyboarding system when TSF is available.

You want to use dead keys.  Why?  Is it not that they are the only
mechanism you have experience of.

Better systems can be built, in which one sees what one is doing.  Is
it not much better to type 'e' and then a circumflex, and see the 'e'
and then the 'e' with a circumflex?  Dead keys are an imitation of a
limitation of typewriter technology.  If I was typing cuneiform, I'd
much rather type 'bi<COMMIT>' and see the growing sequence 'b', 'bi',
'<CUNEIFORM SIGN BI>' as I typed.  (What you have for a <COMMIT> key
is your choice.)  TSF lets one do this. A simple extension of the
keyboard definition DLLs generated by MSKLC does not.  What you should
be pressing for is a usable tutorial on how to do this in TSF.

> If yes, we must modify the interface so that keyboard driver DLLs are
> really read in UTF-16. And/or we must find another compiler. 
> 
> Must the Windows driver be compiled by a Microsoft compiler?

The compiler is not the issue.  The point is that the 16-bit code
exists, and programs that use the 16-bit API exist.  Language upgrades
may make supplementary characters easier to use in programs, but that
is all.  They don't change existing binary interfaces.

Richard.



More information about the Unicode mailing list