Misspelling or Miscoding?

Richard Wordingham richard.wordingham at ntlworld.com
Thu Jan 19 02:45:08 CST 2017

On Wed, 18 Jan 2017 23:24:21 -0800
Asmus Freytag <asmusf at ix.netcom.com> wrote:

> The sequence of character codes isn't necessarily determined by the 
> typist's choice of keystrokes.

Wow!  ESP for input?

> For example, autocorrection and similar support can result in a 
> substitution of character codes. For scripts with this issue, it
> would be useful if such mechanisms were more widespread; effectively 
> normalizing to a preferred input order.

That's not the problem I have in mind.  Dotted circles can help, but
for Northern Thai in the Lanna script, USE has accidentally (I hope)
banned 17% of the vocabulary and demanded that a further 37% be
misspelt.  It will be much the same for Tai Khuen.  Once USE is
fixed, the problem is that the encodings of */hi:m/ and /mi:/ may be
different but render identically; it so happens that words like the
former are rare. Are you aware of predictive input causing havoc with
intellectual content?  

> Arguing over whether this is called mistyping or miscoding or 
> misspelling is perhaps less helpful than trying to get the word out
> that some scripts could strongly benefit from that additional
> software layer.

Enabling that may require some tools to update to Unicode 5.1.
(Hunspell, I'm looking at you.)

One thing that would be helpful is some way of showing the difference
between distinctly encoded homographs if a spell-checker can help.  (I
fear it may not be quite the right tool - different suggestion logic is
needed.) Coloured fonts may help once support for them has spread, but
we're probably still looking at bespoke tools to switch such hints on
and off.  In the past I've used transliteration fonts to check what I've
actually typed.

One problem with getting the message out is choosing the right words.
That's why I came here for advice on the terminology for such issues.


More information about the Unicode mailing list