Re: “plain text styling”…

Kent Karlsson kent.b.karlsson at bahnhof.se
Wed Jan 11 06:25:34 CST 2023



> 11 jan. 2023 kl. 02:05 skrev Cristian Secară via Unicode <unicode at corp.unicode.org>:
> 
> În data de Sun, 8 Jan 2023 15:15:21 +0100, Kent Karlsson via Unicode a scris:
> 
>> The point is that the ”protocol” is at plain text level. That is why
>> ECMA-48 styling can work for applications like terminal emulators,
>> where higher-level protocols, like HTML, are out of the question.
> 
> By human convention, yes. From an abstract technical perspective, whatever protocol and syntax is used, in the end it comes down to just an ON/.../OFF switch.

Yes, but there are different kinds of on/off switches, syntaxwise. Some fit in an otherwise plain text context, others don’t.

>> The SMS (and cell broadcast) 7-bit character encodings (there is a
>> handful of them) all have just four ”control codes”: CR, LF, FF, and
>> SS2 (misnamed(!) as ESC). There is no ESC character nor any CSI
>> character.
> 
> Actually, the GSM 7 bit default alphabet contains the CR, LF and ESC codes,

(There is a handful of 7-bit codepages for SMS and cell broadcast messages. Not only for a kind of ”extended ASCII”, but for several Indic scripts, and one for Arabic.)

Actually there is no ESC. There are CR, LF, FF. And then a code ***called*** ESC, but it is not at all ESC, it is SS2, SINGLE SHIFT 2, it works exactly as SS2.

There is no real ESC character, hence no escape sequences, no CSI character (not even as an ESC sequence) and hence no control sequences. Teletext has a similar issue, where the ESC actually is an SS2.

> placed at their "traditional" hex positions (i.e. 0x0D, 0x0A and 0x1B respectively). A single ESC is used to 'trigger' the extension of the GSM 7 bit default alphabet or a character from a national language single shift table. It is the extension of the GSM 7 bit default alphabet where a 0x1B 0x0A sequence generates 0x0C code (FF, i.e. Form Feed, aka Page Break) and where a 0x1B 0x1B sequence generates another 0x1B code (SS2, which is "reserved for the extension to another extension table”).

That would be an SS3…

> 
>> So SMS and cell broadcast messages are out of scope for that simple
>> reason.
> 
> Probably now useless and out of question in year 2023 for practical reasons, but – in theory – future revisions of the 3GPP TS 23.038 standard can include whatever character might be needed in those reserved-for-future-expansion places.
> 
> *
> Back on topic: funny how the not-so-distant past is so quickly forgotten: during end 198x / beginning 199x period of time I used extensively and with great success a lot of "plain text styling" on at least two impact printers (one being a Citizen 120D+, which I still have today). While in direct print mode (as opposed to graphics mode), there were a lot of font styles modifiers for the printing result (well, a lot for that time), triggered with ESC or CTRL sequences.
> 
> Examples:
> ESC E / ESC F > sets / cancels emphasized print
> ESC G / ESC H > sets / cancels doublestrike print
> ESC 4 / ESC 5 > sets / cancels italic character (Epson only)
> CTRL-O / CTRL-R > sets / cancels compressed print
> ESC k 0 > sets Courier character pitch
> ESC k 1 > sets Citizen Display character pitch
> etc.

This cannot be in any of the SMS/cell broadcast charsets, since they have no (real) ESC; the ”ESC" of SMS 7-bit charsets is actually a misnamed SS2.

Nor is this ECMA-48.

But yes, historically there have been other control/escape sequence definitions for various types of equipments from different manufacturers. I think ECMA-48 was, in part, intended to bring some order to that old mess. I see no reason to bring back various messy definitions. But ECMA-48 control sequences are still relevant, and still used. (I see ECMA-48 styled text every day (that are not my doing)… In a modern setting!)

/Kent K



> Then, in the word processor I used at the time, these codes were allocated to visual control letters or symbols specific to that word processor and ready to be inserted, where required, during text editing.
> 
> This is what a code-controlled printing looked like in 8 bit computing (Z80-based):
> https://www.secarica.ro/misc/text_print_style_via_ctrl_codes_-_tw_cpc.png
> https://www.secarica.ro/misc/text_print_style_via_ctrl_codes_-_tw_zxs.png
> 
> Even if such a text was no longer "plain", for me that was just "text", with no particular type designation and no desire to give one. In today text editors, a text containing such escape codes will display some random garbage in those places, but they can be easily removed (or even converted to whatever modern-days styling syntax) with a Python script or something similar.
> 
> Cristi
> 
> -- 
> Cristian Secară
> https://www.secarica.ro
> 




More information about the Unicode mailing list