A last missing link for interoperable representation

Tex via Unicode unicode at unicode.org
Wed Jan 9 03:37:53 CST 2019


   James Kass wrote:

If a text is published in all italics, that’s style/font choice.  If a text is published using italics and roman contrastively and consistently, and everybody else is doing it pretty much the same way, that’s a convention. 


   Asmus Freytag responded:

But not all conventions are deemed worth of plaintext encoding.

What are the criteria for “worth”?

Way back when, when plain text was very very plain, arguments about not including text styling seemed reasonable. But with the inclusion of numerous emoji as James mentioned, it seems odd to be protesting a few characters that would enhance “plain text” considerably. Plain text editors today support bold, italic, and other styles as a fundamental requirement for usability. More text editors support styling than support bidi or interlinear annotation.

If there was support for the handful of text features used by most plain text editors (bold, italic, strikethrough, underline, superscript, subscript, et al) (perhaps using more generalized names such as emphasis, stress, deleted…) then many of the redundant (bold, italic, …)  characters in Unicode would not have been needed. HTML seemed to do very well with a very few styling elements. HTML is of course rich text, but I am just demonstrating that a very small number of control characters would bring plain text into the modern state of text editing. Editors that don’t have the capability for bolding, underlining, etc. could ignore these controls or convert them to another convention.

As James requested, it would also provide interoperability.

Arguments about all of the conventions that Unicode does not support doesn’t seem compelling to me, as it seems increasingly random as what is accepted and what isn’t, or at least the rationales seem inconsistent.

A case in point is the addition of the “SS” character which made implementation complex with little benefit.

Interlinear annotation is perhaps another example.

I don’t want to enter into a debate about why these deserved inclusion. I am only saying they seem less useful than some other cases which seem deserving. 

**And right now, Dr. Strangelove style, my right hand is restraining my other hand from typing on the keyboard, to avoid saying anything about emoji.**

Ken distinguished numerous variations of stress, which of course have their place, representations and uses. But perhaps for plain text we only need a way to indicate “stress here”, leave it to the text editor to have some form of rendering. For more distinctions the user needs to use rich text. Surely there is an 80/20 rule that motivates a solution rather than letting the one percent prevent a capability that 99% would enjoy.

(Yes I mixed metaphors. I feel an Occupy Unicode movement coming on. J )

I don’t see how adding a few text style controls would be a burden to most implementers. Given ideographic variation sequences, skin tones, hair styles, and the other requirements for proper Unicode support, arguing against a few text styling capabilities seems very last century. (Or at least 1990s…) And it might save having to add a few more bold, italic, superscript, et al compatibility characters…






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20190109/eee63304/attachment.html>

More information about the Unicode mailing list