Misspelling or Miscoding?

Richard Wordingham richard.wordingham at ntlworld.com
Thu Jan 19 19:04:06 CST 2017


On Thu, 19 Jan 2017 14:25:14 -0800
Asmus Freytag <asmusf at ix.netcom.com> wrote:

> Now I'm thinking your focus was more on cases the like two Khmer 
> subjoined consonant sequences:
> U+17D2 U+178A     ្ដ         KHMER CONSONANT SIGN COENG DA
> U+17D2 U+178F     ្ត         KHMER CONSONANT SIGN COENG TA
> that apparently have identical appearance, even though one is a 'd'
> and the other a 't'. (That's the only example that I'm personally
> familiar with).

> Unless some fonts ever make a distinction, this seems to be a case
> where "miscoding" might be an appropriate term. As far as the user is 
> concerned, the issue only arises because of the encoding scheme used.
> (A hypothetical different scheme that had one of these precomposed
> with a name containing something like DA OR TA would have not
> surfaced an invisible distinction).

Such a font might be KHOM2004 mentioned by Michel Antelme in his paper
aefek.free.fr/iso_album/antelme_bis.pdf.  On p25 he makes the point
that a distinct COENG DA was still on its last legs in Cambodia in the
1920's; it's still distinct in the Khom variety of the script.  This
situation makes a good case for the Tibetan model.  We might end up
making the Khmer script a mixed system like Tai Tham by adding a
character KHMER CONSONANT SIGN ARCHAIC COENG DA.

There seem to be some Arabic script analogues, where only one or two
forms differ between a pair of letters.

This is not the situation I was interested in, but it's clearly related.

> Are your examples likewise legitimate duplications or merely the case 
> that one could type something else and have it look the same
> (accidentally).

They're mostly legitimate duplications, though some may stretch
phonological credulity.  For example, in Tai Tham, <NA, SAKOT, HIGH TA,
SIGN I> is part of a common Pali verb inflection and <NA, SIGN I, SAKOT,
HIGH TA> is a valid Northern Thai word (apparently not a Pali loan,
despite its spelling), but <MA, SAKOT, HIGH TA, SIGN I> would probably
be a miscoding of <MA, SIGN I, SAKOT, HIGH TA> (an attested final
syllable) if the language were Northern Thai.  I suppose
it's just conceivable that the former might be the name of a fruit, but
I'm not aware of the syllabic nasal being written that way.

A spell checker would pick up most such errors, though getting the
underlying problem explained to the user might be difficult.

> The Khmer example would seem fairly resistant to automated correction
> if it is a free choice. If, instead, the immediately preceding
> consonant comes from two disjoined sets, for example if TA COENG TA
> was possible, but not TA COENG DA, then there's scope for spell check.

It's supposed to be based on the phonetics, so a spell check could be
used, but not a grammar rule.  However, I can imagine someone writing
in accordance with a rule restricting them to certain bases.

Richard.



More information about the Unicode mailing list