Re: “plain text styling”…

Kent Karlsson kent.b.karlsson at bahnhof.se
Thu Jan 12 10:57:42 CST 2023



> 11 jan. 2023 kl. 17:20 skrev Sławomir Osipiuk <sosipiuk at gmail.com>:
> 
> On Wednesday, 11 January 2023, 07:25:34 (-05:00), Kent Karlsson via Unicode wrote:
>> 
>> Yes, but there are different kinds of on/off switches, syntaxwise. Some fit in an otherwise plain text context, others don’t.
>> 
> I still think the distinction you're drawing – that codes below U+0020 are not "plain text" – is arbitrary.

I did not quite say that.

The thing is that the escape sequences and control sequences are (intended to be) ”default ignorable”. But ECMA-48 was developed long before the formal concept of ”default ignorable” (which is a Unicode concept, and Unicode does not say much about C0 and C1) was invented. And there are exceptions (like LF, HT). And… few applications, other than terminal emulators, actually handle them as ”default ignorable” beyond the ESC or CSI itself (which are in practice default ignorable). So, it is imperfect, but that is the basic idea and what is given in an already existing standard (instead of defining something completely new that is ”default ignorable”). Using this, there is also no need for some printable characters to by necessity be represented as a character reference (in HTML, for instance, a real ”<” almost always must be represented as a character reference, like <).

> What special quality do they have? Can't be typed on a keyboard? Don't have visible glyphs? Affect the display of other characters? Are default-ignorable in Unicode? None of these things are unique to them.
> 
> "Plain text" is a loose definition because "formatted text" is equally loose.

Note that the subject line for this thread has quote marks for that reason.

Ask anyone 50 or so years back ”show me an example of plain text” and likely they would have pointed out any ordinary newspaper article (printed in plain black ink) without fotos, no matter if it used bold, italics, or different sized characters in the text.

> Context matters. It reminds me of "paying cash", which can mean different things when you're buying a hamburger and buying a corporation.
> 
>> 
>> Actually there is no ESC. There are CR, LF, FF. And then a code ***called*** ESC, but it is not at all ESC, it is SS2, SINGLE SHIFT 2, it works exactly as SS2.
>> 
> 
> It rather works like SS1, which we sadly never got in ECMA-48 or ECMA-35. Then SS2 actually is SS2.

And SS1 would be a no-op? A bit like NULL was intended to be…

SS2 is ”jump to secondary codepage”, SS3 is ”jump to tertiary codepage”.

/Kent K




More information about the Unicode mailing list