A last missing link for interoperable representation

Tex via Unicode unicode at unicode.org
Fri Jan 11 04:43:55 CST 2019


Martin,

James is making the case there is demand or a user need and that the proof is that users are using inconsistent tactics to simulate a solution to their problem.

The response that:
"Almost by definition, styled text isn't plain text, even if it's simulated by something else." 
is a bit like Humpty Dumpty saying words mean what I want them to mean. 

Most of the emoji aren't plain text and Unicode has them in abundance. Ruby text is also not plain text. Their inclusion was the user need for consistency and interoperability. The original emoji had inconsistent encodings and were a problem for interchange as well as search and rendering. Their existence and popularity became their own problem requiring further styling (e.g. coloring) and greatly expanded enumeration (foods, animals, et al.) Let's be honest and admit the actual demand for some of these latter objects in plain text is marginal and certainly is less than the prevalence of italics.

The response that:
"the simulation is highly limited, as the voicing examples and the fact that the math alphanumerics only cover basic Latin have shown." unless I misunderstand your meaning, is the argument that we encoded only these therefore the use case is limited to these.

In a different message you say:
"Also, in contrast to the issue discussed in the current thread, there's no consistent or widely deployed solution for such CJK variants in rich text scenarios such as HTML."
I don't see how a rich text solution has any bearing on plain text. We could take the point that if there was no need in HTML to solve the problem than there wasn't demand justifying the need in Unicode. :-)
 I understand your actual intent to say there was a need for CJK variants and there was no other solution. However, the fact that there is a rich text solution for italics isn't helpful to plain text users.
HTML had bidirectional isolates and after the fact Unicode encoded them as well.

The fact that there isn't a consistent way to represent stress or the other uses for italics (or obliques, and bold, etc.) does make certain searches across large numbers of plain texts problematic. In the same way it is sometimes important to distinguish capitalized text when searching (polish vs Polish) it would be helpful to do the same for italicized text. For example, if I am searching for the movie title "Contact" vs. all the places where texts reference a personal "Contact", distinguishing italicized titles would help. And to the extent that users are inserting non-standardized punctuation or other characters for "styling" it makes reliable searching difficult. As James mentioned it helps with interoperability as well.

In the '90s it made sense to resist styling plain text. In the 2020's, with more than 100k characters, numerous pictures and character adornments, it seems anachronistic to be arguing against a handful of control characters that would standardize a common text requirement. Most rendering systems will handle it easily and any plain text editor or other software that supports a combining strikethrough character would easily adapt a combining italicize or a combining bold character.

tex





More information about the Unicode mailing list