Re: “plain text styling”…

Kent Karlsson kent.b.karlsson at bahnhof.se
Sun Jan 8 16:08:49 CST 2023



> 8 jan. 2023 kl. 18:34 skrev Sławomir Osipiuk via Unicode <unicode at corp.unicode.org>:
> 
> On Sunday, 08 January 2023, 09:15:21 (-05:00), Kent Karlsson via Unicode wrote:
>> 
>> The point is that the ”protocol” is at plain text level. That is why ECMA-48 styling can work for applications like terminal emulators, where higher-level protocols, like HTML, are out of the question.
> 
> This does not make sense. Both are formats that need to be interpreted by the display software or they just look like junk within the visible text.

Yes…

> HTML and ECMA-48 are no different in principle.

On this point they are wildly different. One is possible to use in contexts such as terminal emulators, indeed intended for such use. The other one cannot be used in such contexts.
And the precise reason is that one is a plain text protocol, and the other a higher level protocol. One cannot make a HTML(like) based terminal emulator, since the controls in HTML are purely printable characters (which in turn requires that certain characters *must* be represented via character escapes, like <, otherwise risk being part of a control).

Now, using ECMA-48 styling controls for styling text (that may be stored in a file) is not vitally dependent on that. It is, for that use, just a question of reuse of an already existing mechanism for specifying styling. That mechanism need not be locked in to be used only for terminal emulators. (Though some of the proposed addition may be useful also for terminal emulators, and indeed some already are; I ”grabbed” some suggestions from already implemented (in some terminal emulators) additions, with the intent of not compromising those implementations.)

> You can write a terminal emulator that respects basic HTML styling.

Nope. Violates the plain text principle of terminal emulators. (Besides, HTML has a nesting structure, but that is a different obstacle for your suggestion here.)

> The only reason it hasn't been done is because there is no demand, and that is because of historical reasons (including that many terminal scripting languages have syntax that would conflict with HTML).
> 
>>> From a simple (basic) text editor perspective that knows nothing about styling, what is the difference between displaying these two examples related to same intended result ?
>>> <b>bold</b>
>>> versus
>>> \x1b[1mbold\x1b[2m
>> 
>> The first one is a higher level protocol (interpreting substrings consisting purely of ”printable characters” as controls; counting SP,HT and LF as ”printable"), the second is a text level protocol.
> 
> No. Whether "<" or \x1b is a special syntax introducer makes no real difference.

Except that it does. See above.

> You need something to recognize it and interpret it.

Yes, but that is not ”it”.

> Both standards are about interpreting substrings, with opening and closing characters and formatting information between them. There is nothing inherently special about having the characters be below \x20, certainly not any more than, for example, using the tag characters.

There very much is a difference between control characters and printable characters (including SP,LF,CR,HT), in that the latter are ”normal text” to be printed, while control characters are, well control characters not to be printed. True, the distinction is somewhat ”muddled” by that SP/CR/LF/HT/VT/NEL aren’t all that ”pure control”, but characters like SHY actually are control characters but not formally counted as such. Plus the various control characters introduced by Unicode (like bidi controls; note that HTML has it’s separate way of doing bidi controls, using printable characters, not the Unicode bidi controls). So I agree that it is not straight-forward, but there really is a difference.

Kind regards
/Kent K



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20230108/adc0b8e4/attachment.htm>


More information about the Unicode mailing list