Misspelling or Miscoding?

Richard Wordingham richard.wordingham at ntlworld.com
Wed Jan 18 19:12:50 CST 2017


On Wed, 18 Jan 2017 13:35:55 -0700
"Doug Ewell" <doug at ewellic.org> wrote:

> Richard Wordingham wrote:
> 
> > I think it is not a 'typographical error' if it renders as it
> > should!  
> 
> What if it renders correctly on some systems but not on others?

> I do see your point, though. Writing systems that permit different
> spellings of the same glyph (cluster), only one of which is 'correct'
> even after normalization, can be tricky like this. I think this would
> still be a matter of 'misspelling' rather than 'miscoding' because a
> typist should not have to be concerned with character codes per se.

As you've put it, it sounds like the way things were with a simple Thai
typewriter.  A vowel below, a vowel above and a tone mark could be
typed in any order, as though they had three different non-zero
combining classes.  Thais were trained to type into computers by input
routines only accepting the marks in the correct order - this was
before the days of canonical combining classes.

In the case of greatest concern to me, there can be two different
orders, but only one is appropriate for a given word.  In most cases,
only one word of that appearance exists, and one can usually guess which
one does exist. (That is why the system works despite the occasional
ambiguity.)  It's not unlike how Thai would work had phonetic order
been successfully insisted upon, except that there is no evidence that
sorting should be by appearance, whereas in Thai as it was encoded
before Unicode (and is now, after normalisation), encoding and sorting
are based purely on appearance.  (Well, officially - in practice, Thais
appear to sort by doing syllable-by-syllable comparisons.)

In this case of concern, the range of renderings is occasionally
different, which is another reason that two different encodings for the
same appearance must be tolerated.

Richard.


More information about the Unicode mailing list