Encoding italic

Egmont Koblinger via Unicode unicode at unicode.org
Fri Feb 8 15:29:57 CST 2019


Hi guys,

Having been a terminal emulator developer for some years now, I have
to say – perhaps surprisingly – that I don't fancy the idea of reusing
escape sequences of the terminal world.

(Mind you, I don't find it a good idea to add italic and whatnot
formatting support to Unicode at all... but let's put aside that now.)

There are a lot of problems with these escape sequences, and if you go
for a potentially new standard, you might not want to carry these
problems.

There is not a well-defined framework for escape sequences. In this
particular case you might say it starts with ESC [ and ends with the
letter 'm', but how do you know where to end the sequence if that
letter 'm' just doesn't arrive? Terminal emulators have extremely
complex tables for parsing (and still many of them get plenty of
things wrong). It's unreasonable for any random small utility
processing Unicode text to go into this business of recognizing all
the well-known escape sequences, not even to the extent to know where
they end. Whatever is designed should be much more easily parseable.
Should you say "everything from ESC[ to m", you'll cause a whole bunch
of problems when a different kind of escape sequence gets interpreted
as Unicode.

A parser, by the way, would also have to interpret combined sequences
like ESC[3;0;1m or alike, for which I don't see a good reason as
opposed to having separate sequences for each. Also, it should be
carefully evaluated what to do with C1 (U+009B) instead of the C0 ESC[
opening for an escape sequence – here terminal emulators vary. These
just make everything even more cumbersome.

ECMA-48 8.3.117 specifies ESC[1m as "bold or increased intensity".
It's only nowadays that most terminal emulators support 256 colors and
some even support 16M true colors that some emulators try to push for
this bit unambiguously meaning "bold" only, whereas in most emulators
it means "both bold and increased intensity". Because of compatibility
reason, it won't be a smooth switch. Note that "bold" and "increased
intensity" only go in the same direction with white-on-black color
scheme, with black-on-white bold stands out more while increased
intensity (a lighter shade of gray instead of black) stands out less.
(We could also start nitpicking that the spec doesn't even say that
increased intensity is just for the foreground and not for the
background too.)

Should this scheme be extended for colors, too? What to do with the
legacy 8/16 as well as the 256-color extensions wrt. the color
palette? Should Unicode go into the business of defining a fixed set
of colors, or allow to alter the palette colors using the OSC 4 and
friends escape sequences which supported by about half of the terminal
emulators out there?

For 256-colors and truecolors, there are two or three syntaxes out
there regarding whether the separator is a colon or a semicolon.
ECMA-48 doesn't say anything about it, TUI T.416 does, although it's
absolutely not clear. See e.g. the discussion at the comment section
of https://gist.github.com/XVilka/8346728 , in Dec 2018, we just
couldn't figure out which syntax exactly TUI T.416 wants to say.
Moreover, due to a common misinterpretation of the spec, one of the
positional parameters are often omitted.

Some terminal emulators have made up some new SGR modes, e.g. ESC[4:3m
for curly underline. What to do with them? Where to draw the line what
to add to Unicode and what not to? Will Unicode possibly be a
bottleneck of further improvements in terminal emulators, because from
now on every new mode we figure out we'd like to have in terminals
should go through some Unicode committee? And what if Unicode wants to
have a mode that terminal emulators aren't interested in, who will
assign numbers to them that don't clash with terminals? Who will
somehow keep the two worlds in sync?

What to do with things that Unicode might also want to have, but
doesn't exist in terminal emulators due to their nature, such as
switching to a different font size?

> This mechanism [...] is already supported
> as widely as any new Unicode-only convention will ever be.

I truly doubt this, these escape sequences are specific to terminal
emulation, an extremely narrow subset of where Unicode is used and
rich text might be desired.

I see it a much more viable approach if Unicode goes for something
brand new, something clean, easily parseable, and it remains the job
of specific applications to serve as a bridge between the two worlds.
Or, if it wants to adopt some already existing technology, I find
HTML/CSS a much better starting point.


regards,
egmont

On Fri, Feb 8, 2019 at 9:55 PM Doug Ewell via Unicode
<unicode at unicode.org> wrote:
>
> I'd like to propose encoding italics and similar display attributes in
> plain text using the following stateful mechanism:
>
> •       Italics on: ESC [3m
> •       Italics off: ESC [23m
> •       Bold on: ESC [1m
> •       Bold off: ESC [22m
> •       Underline on: ESC [4m
> •       Underline off: ESC [24m
> •       Strikethrough on: ESC [9m
> •       Strikethrough off: ESC [29m
> •       Reverse on: ESC [7m
> •       Reverse off: ESC [27m
> •       Reset all attributes: ESC [m
>
> where ESC is U+001B.
>
> This mechanism has existed for around 40 years and is already supported
> as widely as any new Unicode-only convention will ever be.
>
> --
> Doug Ewell | Thornton, CO, US | ewellic.org
>
>



More information about the Unicode mailing list