Re: “plain text styling”…

Kent Karlsson kent.b.karlsson at bahnhof.se
Sun Jan 8 08:15:21 CST 2023



> 7 jan. 2023 kl. 12:37 skrev Cristian Secară via Unicode <unicode at corp.unicode.org>:
> 
> În data de Thu, 5 Jan 2023 01:53:40 +0100, Kent Karlsson via Unicode a scris:
> 
>> More or less regularly there are (informal) requests on this list for
>> encoding (new) control codes or control code sequences for text
>> styling (like bold, italics, text colour, …) also for ”plain text”.
> 
> This seems to overlooks that a "plain text" subjected to such torment can no longer be called "plain".
> 
> Or, how do you differentiate this plain text from the other plain text ?
> "I am sending this e-mail in strict plain text"
> "I am sending this e-mail in a somewhat plain text"
> "I am sending this e-mail in a complicated plain text"
> "I am sending this e-mail in a code-controlled plain text”

The point is that the ”protocol” is at plain text level. That is why ECMA-48 styling can work for applications like terminal emulators, where higher-level protocols, like HTML, are out of the question.

> Also in places where the number of characters matters (and supposing the editor knows how to interpret, and therefore, hide the control characters), like a SMS text message sent over a GSM network [1], one may become confused about the strange increase (or decrease, if a limit is imposed) of the characters count.

Apart from that the SMS (and cell broadcast) transmission protocol(s) have the capability of having split messages that are reassembled by the receiver...

The SMS (and cell broadcast) 7-bit character encodings (there is a handful of them) all have just four ”control codes”: CR, LF, FF, and SS2 (misnamed(!) as ESC). There is no ESC character nor any CSI character. What is referred to as UCS-2 (read as UTF-16BE) should therefore, for SMS and cell broadcast, be seen as only having CR, LF and FF and no other control characters (though the standard for SMS character encodings is silent on that point).

So SMS and cell broadcast messages are out of scope for that simple reason.

> 
>> This instead of using such things RTF, SGML, HTML, ODF, etc. In the
>> latter, the style (and other) controls are given as strings of
>> printable characters (like <b>, </b>), not involving control
>> characters.
> 
> Not sure I understand, especially that you later mentioned ECMA-48.
> 
> From a simple (basic) text editor perspective that knows nothing about styling, what is the difference between displaying these two examples related to same intended result ?
> <b>bold</b>
> versus
> \x1b[1mbold\x1b[2m

The first one is a higher level protocol (interpreting substrings consisting purely of ”printable characters” as controls; counting SP,HT and LF as ”printable"), the second is a text level protocol.

> Same question if the ~simple text editor *knows* about both of the above styling methods ?

If we are talking about files, the file name suffix is the most common way of dealing with that (like .html vs. .txtf vs. .txt).

Kind regards
/Kent K

> Or perhaps from the user perspective ?
> 
> Cristi
> 
> [1] https://www.secarica.ro/index.php/eue/sms-story/the-sms-discrimination
> 
> -- 
> Cristian Secară
> https://www.secarica.ro
> 




More information about the Unicode mailing list