Why Work at Encoding Level?

Richard Wordingham richard.wordingham at ntlworld.com
Wed Oct 21 16:50:10 CDT 2015

On Wed, 21 Oct 2015 19:50:32 +0100
Daniel Bünzli <daniel.buenzli at erratique.ch> wrote:

> Le mercredi, 21 octobre 2015 à 19:13, Richard Wordingham a écrit :
> > That sounds good, but would you please talk me through how you
> > apply it in the TSF method InsertTextAtSelection. Remember that the
> > user may have switched input method several times.  
> Sorry don't know these acronyms or methods. Interaction with the
> input method should always eventually yield a stream of scalar
> values; if it's badly designed, you should try to abstract it so that
> it provides the right mecanism for you.

The simpler-looking input methods provide and then delete text. For
example, I have a Keyman for Linux input editor based on XSAMPA in
which I can successively input e_H\ to get the successive text displays
e e_ é e˦. (The latter includes a spacing tone mark.)  The input editor
knows whether my application has <e, U+0301> or U+00E9 LATIN SMALL
LETTER E WITH ACUTE in my backing store when I strike the backslash -
there is a callback for this very purpose, but the input editor does
have fallback logic, which is needed when it uses the X protocols.  It
uses the GTK+ interface with GTK+ applications, and sends the commands
"delete one character before the cursor" in each case and "insert e˦"
or "insert e˦" accordingly.  Now, the GTK+ commands function in terms
of scalar values, which should be nice.  However, notice that text and
positions go in both directions across the interface.

The Text Services Framework on Windows works similarly, but its
commands seem to be expressed in terms of absolute UTF-16 positions.
Abstraction may move the problem, but it doesn't eliminate it.  The
best one can hope for is a reusable abstraction.


More information about the Unicode mailing list