Misspelling or Miscoding?
Richard Wordingham
richard.wordingham at ntlworld.com
Thu Jan 19 19:04:06 CST 2017
On Thu, 19 Jan 2017 14:25:14 -0800
Asmus Freytag <asmusf at ix.netcom.com> wrote:
> Now I'm thinking your focus was more on cases the like two Khmer
> subjoined consonant sequences:
> U+17D2 U+178A ្ដ KHMER CONSONANT SIGN COENG DA
> U+17D2 U+178F ្ត KHMER CONSONANT SIGN COENG TA
> that apparently have identical appearance, even though one is a 'd'
> and the other a 't'. (That's the only example that I'm personally
> familiar with).
> Unless some fonts ever make a distinction, this seems to be a case
> where "miscoding" might be an appropriate term. As far as the user is
> concerned, the issue only arises because of the encoding scheme used.
> (A hypothetical different scheme that had one of these precomposed
> with a name containing something like DA OR TA would have not
> surfaced an invisible distinction).
Such a font might be KHOM2004 mentioned by Michel Antelme in his paper
aefek.free.fr/iso_album/antelme_bis.pdf. On p25 he makes the point
that a distinct COENG DA was still on its last legs in Cambodia in the
1920's; it's still distinct in the Khom variety of the script. This
situation makes a good case for the Tibetan model. We might end up
making the Khmer script a mixed system like Tai Tham by adding a
character KHMER CONSONANT SIGN ARCHAIC COENG DA.
There seem to be some Arabic script analogues, where only one or two
forms differ between a pair of letters.
This is not the situation I was interested in, but it's clearly related.
> Are your examples likewise legitimate duplications or merely the case
> that one could type something else and have it look the same
> (accidentally).
They're mostly legitimate duplications, though some may stretch
phonological credulity. For example, in Tai Tham, <NA, SAKOT, HIGH TA,
SIGN I> is part of a common Pali verb inflection and <NA, SIGN I, SAKOT,
HIGH TA> is a valid Northern Thai word (apparently not a Pali loan,
despite its spelling), but <MA, SAKOT, HIGH TA, SIGN I> would probably
be a miscoding of <MA, SIGN I, SAKOT, HIGH TA> (an attested final
syllable) if the language were Northern Thai. I suppose
it's just conceivable that the former might be the name of a fruit, but
I'm not aware of the syllabic nasal being written that way.
A spell checker would pick up most such errors, though getting the
underlying problem explained to the user might be difficult.
> The Khmer example would seem fairly resistant to automated correction
> if it is a free choice. If, instead, the immediately preceding
> consonant comes from two disjoined sets, for example if TA COENG TA
> was possible, but not TA COENG DA, then there's scope for spell check.
It's supposed to be based on the phonetics, so a spell check could be
used, but not a grammar rule. However, I can imagine someone writing
in accordance with a rule restricting them to certain bases.
Richard.
More information about the Unicode
mailing list