Italics get used to express important semantic meaning, so unicode should support them

Kent Karlsson kent.b.karlsson at bahnhof.se
Wed Dec 16 18:46:33 CST 2020



> 16 dec. 2020 kl. 02:14 skrev Sławomir Osipiuk via Unicode <unicode at unicode.org>:
> 
> On Tue, Dec 15, 2020 at 6:07 PM Kent Karlsson
> <kent.b.karlsson at bahnhof.se> wrote:
>> Now, where did I see something very much like this???
>> Oh yes, ECMA-48. Not exactly the same, but quite close. Indeed very close (especially the ”invisible by default” (”default ignorable”) IF parsed correctly).
> 
> ECMA-48 aka ISO 6429 was on my mind the moment I read the OP. I didn't
> mention it because it's a bit outdated (even if I do have a fondness

It is certainly not outdated. It’s a long time since the last update, but it is not outdated.
It us used in EVERY terminal emulator (worthy of the name), granted to varying degrees
and varying quality of implementation (but that is another matter). Italics, bold, underline
and colouring are popular uses of the formatting part of ECMA-48 in terminals.

One could imagine completely reinventing how terminals (i.e. terminal emulators nowadays)
work. But that would face massive compatibility issues. My projection is that 1) terminal
emulators will continue to be used indefinitely, and 2) they will continue to use ECMA-48
or an extension thereof (there are already some extensions that have been implemented).

(That is opposed to Teletext, which still is very much used in practice, but I think that may
change in five or ten yers time.)

> for it) and if you're using such a thing, why not a more modern HTML
> subset, or BBCode, or any number of other options in use or from the
> list the OP gave? There are, after all, so many to choose from. And if

Because:
1) They would be incompatible with how terminals work.
2) They cannot work for terminals since there is no clear distinction between what is ”markup”
and what is not; the distinction today much relies on file type (via name suffix or other mechanism,
like document setting or view mode, or ”guessing” from reading the beginning of the document).
Those mechanisms do not exist in terminals.

> none of those satisfy, you can always make your own!

Again, if one were to invent something entirely new (not based on ECMA-48) in this area that still has
the potential to be used in terminals, that would face massive compatibility issues with how terminals
work today and are expected to work ”from the other side of the terminal” (i.e. what programs send to
the terminal side). (Yes I know about termcap.)

> But that "if parsed correctly" is quite the nit, isn't it?

If every terminal (emulator) can handle it (granted, to varying degrees of quality), it does not seem
too hard…

> 
>> It is not entirely inconceivable to map all the (otherwise) printable characters used by such control sequences to TAG characters, thus making the ”default ignorable” part of this a bit easier.
> 
> And this is just the BabelPad solution but applied to a different
> protocol. Replacing regular markup by corresponding characters from
> the tag block to gain ignorable-ness may seem like a cool idea at
> first, but it's just spinning yet another markup. (With no offense

In a sense, yes. But the idea to use TAG characters for this has popped up on this list
multiple times. So if mapping ECMA-48-ish control sequences to use TAG characters
makes ECMA-48-ish formatting control sequences more palatable, then ok.

/Kent Karlsson





More information about the Unicode mailing list