Misspelling or Miscoding?
Mark Davis ☕️
mark at macchiato.com
Thu Jan 19 01:52:05 CST 2017
We don't have any set terminology for what you're talking about.
We've often just used 'misspelling' in a broad sense, which can include
visually confusable or identical glyphs. For example, spelling 'of' with an
omicron would be one, as well as a word in a complex script with swapped
marks. And cases of the former occur surprisingly often in web pages:
probably to do with people switching keyboards in mid-stride. They are in
(say) a Greek keyboard, hit omicron and then the Greek character in the 'f'
position, notice it is wrong, and backspace — but just over the character
that 'looks' wrong — then type 'f'.
The problem with using the term "miscoding" is that it is overloaded. It
can be used as having something to do with the character encoding level:
for example, interpreting a string of UTF-8 bytes as Latin-1. The sequence
<omicron, f> is a perfectly valid Unicode string, not — in that sense —
On Thu, Jan 19, 2017 at 2:12 AM, Richard Wordingham <
richard.wordingham at ntlworld.com> wrote:
> On Wed, 18 Jan 2017 13:35:55 -0700
> "Doug Ewell" <doug at ewellic.org> wrote:
> > Richard Wordingham wrote:
> > > I think it is not a 'typographical error' if it renders as it
> > > should!
> > What if it renders correctly on some systems but not on others?
> > I do see your point, though. Writing systems that permit different
> > spellings of the same glyph (cluster), only one of which is 'correct'
> > even after normalization, can be tricky like this. I think this would
> > still be a matter of 'misspelling' rather than 'miscoding' because a
> > typist should not have to be concerned with character codes per se.
> As you've put it, it sounds like the way things were with a simple Thai
> typewriter. A vowel below, a vowel above and a tone mark could be
> typed in any order, as though they had three different non-zero
> combining classes. Thais were trained to type into computers by input
> routines only accepting the marks in the correct order - this was
> before the days of canonical combining classes.
> In the case of greatest concern to me, there can be two different
> orders, but only one is appropriate for a given word. In most cases,
> only one word of that appearance exists, and one can usually guess which
> one does exist. (That is why the system works despite the occasional
> ambiguity.) It's not unlike how Thai would work had phonetic order
> been successfully insisted upon, except that there is no evidence that
> sorting should be by appearance, whereas in Thai as it was encoded
> before Unicode (and is now, after normalisation), encoding and sorting
> are based purely on appearance. (Well, officially - in practice, Thais
> appear to sort by doing syllable-by-syllable comparisons.)
> In this case of concern, the range of renderings is occasionally
> different, which is another reason that two different encodings for the
> same appearance must be tolerated.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode