A last missing link for interoperable representation
Martin J. Dürst via Unicode
unicode at unicode.org
Sun Jan 13 21:00:36 CST 2019
On 2019/01/14 01:46, Julian Bradfield via Unicode wrote:
> On 2019-01-12, Richard Wordingham via Unicode <unicode at unicode.org> wrote:
>> On Sat, 12 Jan 2019 10:57:26 +0000 (GMT)
>> And what happens when you capitalise a word for emphasis or to begin a
>> sentence? Is it no longer the same word?
> Indeed. As has been observed up-thread, the casing idea is a dumb one!
> We are, however, stuck with it because of legacy encoding transported
> into Unicode. We aren't stuck with encoding fonts into Unicode.
No, the casing idea isn't actually a dumb one. As Asmus has shown, one
of the best ways to understand what Unicode does with respect to text
variants is that style works on spans of characters (words,...), and is
rich text, but thinks that work on single characters are handled in
plain text. Upper-case is definitely for most part a single-character
phenomenon (the recent Georgian MTAVRULI additions being the exception).
UPPER CASE can be used on whole spans of text, but that's not the main
use case. And if UPPER CASE is used for emphasis, one way to do it (and
the best way if this is actually a styling issue) is to use rich text
and mark it up according to semantics, and then use some styling
directive (e.g. CSS text-transform: uppercase) to get the desired look.
Another criterion is orthography. Schoolchildren learn when to
capitalize a word and when not. Teachers check and correct it all the
time. Grammar books and books for second language learners discuss
capitalization, because it's part of orthography, the rules differ by
language, and not getting it right will make the writer look bad.
But even most adults won't know the rules for what to italicize that
have been brought up in this thread. Even if they have read books that
use italic and bold in ways that have been brought up in this thread,
most readers won't be able to tell you what the rules are. That's left
to copy editors and similar specialist jobs.
There was a time when computers (and printers in particular) were
single-case. There was some discussion about having to abolish case
distinctions to adapt to computers, but fortunately, that wasn't necessary.
More information about the Unicode