A last missing link for interoperable representation

Martin J. Dürst via Unicode unicode at unicode.org
Sun Jan 13 21:00:36 CST 2019

On 2019/01/14 01:46, Julian Bradfield via Unicode wrote:
> On 2019-01-12, Richard Wordingham via Unicode <unicode at unicode.org> wrote:
>> On Sat, 12 Jan 2019 10:57:26 +0000 (GMT)

>> And what happens when you capitalise a word for emphasis or to begin a
>> sentence?  Is it no longer the same word?
> Indeed. As has been observed up-thread, the casing idea is a dumb one!
> We are, however, stuck with it because of legacy encoding transported
> into Unicode. We aren't stuck with encoding fonts into Unicode.

No, the casing idea isn't actually a dumb one. As Asmus has shown, one 
of the best ways to understand what Unicode does with respect to text 
variants is that style works on spans of characters (words,...), and is 
rich text, but thinks that work on single characters are handled in 
plain text. Upper-case is definitely for most part a single-character 
phenomenon (the recent Georgian MTAVRULI additions being the exception).

UPPER CASE can be used on whole spans of text, but that's not the main 
use case. And if UPPER CASE is used for emphasis, one way to do it (and 
the best way if this is actually a styling issue) is to use rich text 
and mark it up according to semantics, and then use some styling 
directive (e.g. CSS text-transform: uppercase) to get the desired look.

Another criterion is orthography. Schoolchildren learn when to 
capitalize a word and when not. Teachers check and correct it all the 
time. Grammar books and books for second language learners discuss 
capitalization, because it's part of orthography, the rules differ by 
language, and not getting it right will make the writer look bad.

But even most adults won't know the rules for what to italicize that 
have been brought up in this thread. Even if they have read books that 
use italic and bold in ways that have been brought up in this thread, 
most readers won't be able to tell you what the rules are. That's left 
to copy editors and similar specialist jobs.

There was a time when computers (and printers in particular) were 
single-case. There was some discussion about having to abolish case 
distinctions to adapt to computers, but fortunately, that wasn't necessary.

Regards,   Martin.

More information about the Unicode mailing list