Encoding colour (from Re: Encoding italic)

Kent Karlsson via Unicode unicode at unicode.org
Mon Feb 11 16:46:32 CST 2019

Continuing too look deep into the crystal ball, doing some more
hand swirls...



The scheme quoted (far) below (from wjgo_10009), or anything like it,
will NEVER be part of Unicode!


But I do like colour (and bold and italic) also for otherwise "plain"
text. And having those stylings represented in a lightweight manner,
in many cases. Not needing heavy-lifting with (say) HTML+CSS. More on
that further below.

As we have noted already on this thread, we already have a standard
for specifying background and foreground (the glyphs for the text)
colour. As ESC (command) sequences. It even has (non-standard) "room" for
for an alpha channel (after the 6th ':', a parameter position otherwise
unused for RGB; it is used for K of CMYK in the ITU T.416 standard).

Colour, RGB, with alpha channel T (0: opaque, 255: fully transparent;
this way around since 0 is the default default value in these things),
can be given with the detailed syntax below (it matches the overall
syntax, so there is no overall syntax error for the detailed syntax).
The brackets, except the single first one, indicate optional; strictly
speaking everything after the "2" here is incrementally optional, but
that is a nit; the i and the "a:s" are intended for different kinds
of colour adjustments (at least the "i" one being implementation defined).
But those are a bit too detailed to pick up here.

The lowercase variables, not the final m, here to be replaced by digits
representing values 0 to 255. A syntax error would result in the
command sequence being ignored. (If too long, longer than 35(?) chars,
the printable characters would be displayed, no interpretation as a
command sequence.) The 2 means RGB (and, here, T) colour specification.

Foreground colour: ESC [38:2:i:r:g:b[:t[:a:s]]m
Background colour: ESC [48:2:i:r:g:b[:t[:a:s]]m

E.g. ESC [38:2:0:70:100:200:100m  for a slightly transparent bluish
foreground colour. Separator is (must be) colon, so as not to interfere
with the permitted (but I would not recommend it) multiple style
settings in a single SGM command sequence, using semicolon separator.


Now, colour for plain text? Well, lots of people are editing coloured
plain text daily! Any decent modern IDE does automatic syntax colouring
(and bold and italic). And that for program source text, which certainly
does not have any HTML/CSS or any other higher-level (formatting)
protocol applied to them. Ok, the colouring/bold/italic is entirely
internal. It is not saved in the files in any way, it is derived. But
it would be nice to sometimes keep the syntax colouring, when quoting
a piece of program source code (from an IDE) Into a chat conversation,
for instance. Or pasting a piece of source code into a presentation
slide or a document (in these cases any light-weight colouring/style
would need to be converted to whatever representation is used for
such things in those document formats, something more "heavy-weight").

And keep the formatting/colour in a light-weight manner, when
copying/cutting (ctrl-c/ctrl-x) text from an IDE. One that is
also easy to strip away (if pasting a perhaps modified version of it
into a source file (via an IDE)). The "heavy-weight" ones are harder to
strip away, and might not even be supported on the target platform.

ESC/command sequences are easy to strip away, due to the starting
control character and well-defined overall syntax, even though it
is only the start character that is (otherwise) non-printable in
the sequence. They were designed for being easy to parse out! And they
are already standardised! Platform independently. And light-weight.
Granted, they are, for now, only popular to implement in terminal
emulators. But the styling command sequences are NOT specifically
made for terminal (emulators).

If you worry about actual ESC characters in source code (strings),
those should be written as \e, or other more general escape sequence
(a completely different, though somewhat related, sense of the term
"escape sequence"), like \u001B. It is a REALLY bad idea to have
a real escape character (U+001B) in a source code string literal.

(Nit: The "predefined" colours in ECMA-48 are not useful for this.
They are too stark. The IDEs (by default) use milder colours.)

If you think that using styling on program source text is a new-fangled
idea that came with the IDEs: No, it started already in the sixties.
Algol-60 source text, when printed in books, had the keywords written
in bold. For the *actual* programs, IIRC (at least for some compiler),
one had to mark the keywords with underscore: _BEGIN_, _IF_, ...
(No lowercase in computers then...) The keywords were initially
not reserved, so one had to mark them. And... often stored as punched
cards or punched paper tape...

While possible, I do NOT propose to use command sequences to mark
keywords (etc.) as bold (or colour) when input to a compiler.
NOR do I propose to encode characters for punched hole patterns...
(Have to draw the line somewhere. ;-)

/Kent K

Den 2019-02-11 17:11, skrev "wjgo_10009 at btinternet.com"
<wjgo_10009 at btinternet.com>:

> Suppose that there are sixteen new characters, which are in plane 1 or
> maybe plane 14, but which for this mailing list post I will express
> using the digits 0 .. 9, Z, R, G, B, A, F.
> There would be a virtual machine to set the colour, that would have
> registers h, r, g, b, a and a system service
> Set_Foreground_Colour(r,g,b,a).
> Then the sixteen new characters would each have a default glyph, which
> could be displayed emoji-style, and, in an application environment that
> has the virtual machine available and switched on, would have the
> following effects in the virtual machine and their glyphs would not then
> be displayed. The virtual machine would be sandboxed.
> Z h:=0;
> 0 h:=10*h ;
> 1 h:=10*h + 1;
> 2 h:=10*h + 2;
> 3 h:=10*h + 3;
> 4 h:=10*h + 4;
> 5 h:=10*h + 5;
> 6 h:=10*h + 6;
> 7 h:=10*h + 7;
> 8 h:=10*h + 8;
> 9 h:=10*h + 9;
> R r:=h; h:=0;
> G g:=h; h:=0;
> B b:=h; h:=0;
> A a:=h; h:=0;
> F Set_Foreground_Colour(r,g,b,a);
> Thus for example, remembering that these ordinary characters are just
> being used here for explanation in this post, and that the actual
> characters if encoded would probably be in plane 1 or plane 14:
> So the sequence Z128R160G248B255AF could be used to set the foreground
> colour to an opaque blue colour.

More information about the Unicode mailing list