Ecma-48 proposed styling controls update updated & math expression representation proposal update

Giacomo Catenazzi cate at cateee.net
Tue Jan 9 06:58:58 CST 2024


On 8 Jan 2024 23:36, Kent Karlsson wrote:
> 
> Skickat från min iPhone
> 
>> 8 jan. 2024 kl. 14:00 skrev Giacomo Catenazzi via Unicode <unicode at corp.unicode.org>:

>> usually send as `ESC [`
> 
> In modern terms, that is a character reference.
> 
>> (and if should be terminated by characters between 0x40 and 0x7E, but there were bugs and exceptions on some platforms), there is also one single character in C1 (so still two bytes in UTF-8), but many terminal disregard this alternate (which it is also very old).
> 
> Because in a terminal (emulator) the character encoding may change without notice.

But the point of ECMA-48 (with ECMA-35, ECMA-43) is to know it, and 
being able to map the function CSI to (nearly any) C0 or C1 characters? 
In any case, if the terminal (emulator) is using UTF-8, it should be 
clear that such sequences are useable.

Or should we improve such part? We have the infrastructure and standards 
(since a very long time). If the client want UTF-8, it just send the 
relative prefix, so that we know C0, C1, GL, *GR* as expected for UTF-8, 
so no more weird text because emulators and programs doesn't agree on 
encoding.


>> But so we see the advantage of having elements (and tags) written as clear text (as in HTML, LaTeX, etc.): if we do not understand one element we can google it. With ECMA-48 code: either is standard, or good luck to find some references.
> 
> But HTML etc. are, and will continue to be, complete non-starters for terminal emulators.
> 
> However, ECMA-48 styling can be used to style enhance what is otherwise a plain text document, without getting entangled in a second level interpretation of tags, typesettig commands, similar, expressed in what would be “plain text”.
> 

Yes, but do we really care so much about styling in terminal emulators?

In my experience, the best formatted text in emulators were (and are) 
the manual pages (bold, italic, good *dynamic* layout, etc.). But I (and 
it seems most of people) find much more readable to view them online 
(also on very bad html-formatted, which are unfortunately common).

(and I found Lynx and w3c bad).


But how do you input the formatting? you will use a sort of HTML or 
markdown, which will translate with new ECMA-48-style formatting. But so 
why not using such high level formatting for interchange? What is not 
one of the old point of the proposal? Or it is just a local styling fact 
(so more at a level of library (termcap/terminfo))?



>> ECMA-48-like syntax is bad, difficult to enhance without requiring updates on all programs (contrary to HTML: tags can be
>> just ignored, without consequences to next ones,
> 
> I don’t see your point. I.e., I don’t see that there is any difference in principle.


With CSI we have too many way to "reset" styling (or also to set the 
same property, e.g. red text, bur also to push/pop it, etc). So it makes 
difficult to enhance and it is "by design".  Do you remember 20 years 
ago? Terminal supported colours, but many programs behave weirdly, 
because programs didn't implement all CSI, and different programs had 
different expectation.

My expectation is that most of enhancements will bring us back on such 
time, until emulator maintainer converge on a single behaviour.

And standardising is not a solution. ECMA-48 (and ECMA-35) are 
standardised, but could you cite me an emulator which implement them? As 
we discussed at beginning, we have the CSI (and in general C1) problem, 
but there are many other points are not fully supported. BTW would it 
solve (partly) the "Teletext" problem?

But also programs have different expectations. Many programs doesn't 
expect proportional fonts: they expect monospace fonts (eventually with 
wide-characters).

We lack full support of ECMA-35, ECMA-43, and ECMA-48, so my expectation 
on defining new extensions is minimal.

> 
>> or previous *non-closed* one). Note: this fact is caused by a different reason, which modern mark-up languages shares: they are *structured* (which ECMA-48 is not,
> 
> Well, for the most part not, but tables (and Ruby) actually is structured also in ECMA-48. As are bidi controls. As are hyperlinks (OSC 8), proposed by others. And so are math expressions (in all three variants, separate proposal, but one variant is compatible with ECMA-48, another with HTML).

But the purpose? For interchange information (and to type), we will use 
a different format, so better to use such format which users can 
understand (HTML, Markdown, TeX, or a new "UniFormat"). To display? is 
someone willing to program it? Many standards have: "usage first, 
standardisation later".

> 
> And most (major exception: CSI 0m, which should only be used for terminal emulators) styling controls have a start-(change)*-end structure, but unrelated styling controls need not nest. You may consider that last bit a flaw, but one that cannot be fixed for compatibility reasons.

Yes. The "compability" is a curse. HTML was bolder: there were 
deprecation (and removal of features). Implementator could support old 
behaviour, but it was a strong push to convert to more "modern" 
constructs. ECMA-43 took the best part of ECMA-35 ("let's assume 8-bit, 
so forget how to handle 8-bit and multichars in a 7-bit system"), but we 
still keep the 7-bit compatibility "ESC [".

And ECMA was not always compatible with previous versions. If you want 
to do an improved ECMA-48, please break compatibility (and let termcap 
to handle them).


In addition: one of the problem was the lack of good documentation of 
what CSI code does (in a precise way, so also interaction with other CSI 
and states). Thomas Dickey (xterm) does an excellent job, on 
documenting, but it is strictly the interpretation of xterm. The other 
emulators have limited documentation (mostly just on extensions), but we 
lack of good centralised place. So: it is difficult to have 
"compatibility" when we lack of well defined current behaviour.

My hope of the proposal is about refreshing documentation and sumarizing 
expected behaviour (so the good part of ECMA-48, the de-facto standard 
enhancements, and maybe something new), but I would see little value on 
interchange of data. (and to display maths, we can draw "anything" on 
terminals).

	cate


> 
> /Kent K
> 
>> not TeX, but TeX is frozen). Past gives us a lesson, let's learn about it, and not doing the same errors. ECMA-48 is the past (still used on some appliances, but without expectation to enhance it too much: we have alternate graphical interfaces).
>>
>> [Note: but I think there is a need for an update of ECMA-48: to standardise common behaviour, but it should be done by the maintainers of the different terminals].
>>
>> cate
>>
>>
>>
>>> On 8 Jan 2024 13:19, William_J_G Overington via Unicode wrote:
>>> Previously I wrote:
>>>> MORE NEEDED about Anne showing Patricia how to code the text in green and yellow.
>>> I managed to download the PDF document to local storage and I have found on page 55 of the PDF document a table with codes in Kent's proposed enhanced system for setting the colours to alphanumerics green and alphanumerics yellow so that writing of the story can make progress.
>>> It appears that for alphanumerics green SP? CSI 92m is needed and that for alphanumerics yellow that SP? CSI 93m is needed. I expect that the code alphanumerics green will be needed at the start of the line and after the name Gutenberg and that alphanumerics yellow will be needed before the name Gutenberg. A teletext alphanumerics colour code automatically generates a space in the display, so I am wondering if, for the possibility of making round-trip conversion back to teletext format that the space should be after the code rather than before it as is listed in Kent's document so that in a round trip the space following the colour code can be omitted when it is reached rather than needing to go back and remove it when the colour code is detected.
>>> I do not currently know what CSI means in this context. There are 808 mentions of CSI is the PDF document and the first one is on page 6 but at present I do not understand it.
>>> I am thinking that if this story can be completed and includes a reference to Kent's document and Anne and Patricia have a discussion about why Anne thinks it better to have the space after the colour code rather than before then the story might well be a good learning resource.
>>> William


More information about the Unicode mailing list