A last missing link for interoperable representation

David Starner via Unicode unicode at unicode.org
Sun Jan 13 23:06:04 CST 2019


On Sun, Jan 13, 2019 at 7:03 PM Martin J. Dürst via Unicode
<unicode at unicode.org> wrote:
> No, the casing idea isn't actually a dumb one. As Asmus has shown, one
> of the best ways to understand what Unicode does with respect to text
> variants is that style works on spans of characters (words,...), and is
> rich text, but thinks that work on single characters are handled in
> plain text. Upper-case is definitely for most part a single-character
> phenomenon (the recent Georgian MTAVRULI additions being the exception).

I would disagree; upper case is normally used in all caps or
title-case, and the latter is used on a word, not a character.

I don't argue that Unicode is wrong for handling casing the way it
does, but it does massively complicate the processing of any Latin
text; virtually all searches should be case-insensitive, for example.
At least in English, computerized casing will always be problematic.

> UPPER CASE can be used on whole spans of text, but that's not the main
> use case. And if UPPER CASE is used for emphasis, one way to do it (and
> the best way if this is actually a styling issue) is to use rich text
> and mark it up according to semantics, and then use some styling
> directive (e.g. CSS text-transform: uppercase) to get the desired look.

That's an example of how having multiple systems makes things more
complex and less consistent. If something can be written as all upper
case with the caps lock key, it will be. If a generated HTML file can
have uppercase added with a Python or SQL function, it probably will
be. Using CSS text-transform may be best practice, but simpler plain
text solutions will be used in a lot of cases and nothing can be
extrapolated clearly from its use or lack of use.

-- 
Kie ekzistas vivo, ekzistas espero.



More information about the Unicode mailing list