Re: “plain text styling”…

Asmus Freytag asmusf at ix.netcom.com
Tue Jan 10 00:07:22 CST 2023


On 1/9/2023 9:09 PM, Doug Ewell via Unicode wrote:
>> 3. Emoji vs text presentation.
to me that's more clearly pseudo-encoding than some of the other things 
now possible with emoji. It's because the wrong presentation is nearly 
always really wrong, so there's no common fallback.

And add to that, that the introduction of the wrong default made 
existing applications and texts suddenly fail, and you have one of the 
worst blunders in Unicode's encoding history.
>>
>> 4. "Extreme" ligaturing involving emoji ZWJ sequences, regional tags
>> becoming flags, and other pseudo-encoding.
> I would actually consider things like bold, italics, and color to be less of an affront to “plain text” than an emoji presentation form or a sequence that adds up to “woman firefighter with medium-dark skin tone.” Granted ECMA-48 can be used for effects that are less plain-texty than bold, italics, and color.
>
In some ways most of the emoji sequences are really more akin to making 
new characters by adding diacritic marks, or making new shapes in 
context, the way shapes fuse in Indic conjuncts.

A skintone in some sense has more similarity to a diacritic on a vowel; 
just because it's not a mark, but a shade, doesn't erase the similarity. 
The whole visual design space for emoji is different. While color is 
simply an attribute on text, skintone hews closer to a semantic 
component in the way it works.

The same goes for other colors as well, a "black cat" and a generic 
kitty have distinct, if overlapping semantic space, and on the level of 
an individual symbol.

The concept of semantic ligatures, like the female astronaut, is 
interesting, it's a departure from purely graphical constructs like 
stacks, conjuncts and ligatures, but while most Latin ligatures are 
optional, many conjuncts are not, and using a fallback will alter 
meaning, again on the individual grapheme level.

Formatting / styling to me is distinguished by something that's 
conceptually always applied to a run of text, and usually not on runs of 
length one. The main exception to that was mathematical notation, and we 
opted to make a principled exception, precisely because semantic mapping 
to highly specific shapes for an individual symbol is or should not be 
the task of "styling".

Flag sequences and the like are true examples of pseudo coding. 
Introducing a scheme that maps arbitrary code point sequences to a 
symbol in a way that depends on definitions maintained outside the 
Unicode Standard. It's the clearest case of injecting another character 
set (or a lego system to representing one) into the Standard that I've seen.

We could have done the same with three-letter codes for currency 
symbols, but we didn't, and that marks the difference.

A./
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20230109/371b73c1/attachment.htm>


More information about the Unicode mailing list