Ecma-48 proposed styling controls update updated & math expression representation proposal update

Kent Karlsson kent.b.karlsson at bahnhof.se
Thu Jan 11 14:49:30 CST 2024


> 11 jan. 2024 kl. 13:24 skrev Marius Spix via Unicode <unicode at corp.unicode.org>:
> 
> 
> Question: How do you copy text preserving the styling?
> For example, you have the following text (in these examples I use ^ as escape character and visible characters instead of the proposed tagging characters.)
>  
> This is a ^[31mred Text^[0m ECMA-48 styling.
>  
> You now want to copy the word "text" and insert it to another document. The styling information gets lost.
> Then you copy the words "a ^[31mtext" and your whole document after these words becomes red until the text color is changed again. This is very confusing and unintuitive. ECMA-48 styling is stack-based and stateful, which makes it hard to select and copy text to another location.
>  
> Another question: How are you supposed to compare ECMA-48 styled texts? The strings
> "This is a ^[31mred^[0m Text" and "This is a ^[31mr^[0m^[31me^[0m^[31md^[0m" text look and behave exactly the same, but are technically different.

It is certainly a somewhat tricky issue. But it is solvable. Just at any editor that allows text styling and copy-paste. Regardless of representation, internal or external. (E.g. MS Word; though it still has some bugs. Sorry for mentioning a specific product.)

> This opens up a wide range of attack vectors, e.g. on source code, file names, URIs,

I suggest nothing new w.r.t. those.

> legal documents etc. For example, a user could create two different versions of identically looking documents, which result in the same hash to spoof digital signatures. It also allows watermarking texts by inserting a detectable pattern to prevent copyright violations.

I suggest nothing new in regard to those either. (B.t.w., such hashes are usually based on the very lowest level of (external) representation, i.e. the byte values, without any interpretation of that.)

/Kent K

> Regards,
>  
> Marius Spix
>  
>  
>  
> Gesendet: Donnerstag, 11. Januar 2024 um 11:32 Uhr
> Von: "Giacomo Catenazzi via Unicode" <unicode at corp.unicode.org>
> An: "Kent Karlsson" <kent.b.karlsson at bahnhof.se>
> Cc: unicode at corp.unicode.org
> Betreff: Re: Ecma-48 proposed styling controls update updated & math expression representation proposal update
> On 9 Jan 2024 23:12, Kent Karlsson wrote:
> (...)
> 
> Let's skip a lot of *details*.
> 
> >>
> >> But how do you input the formatting?
> >
> > For output to a terminal emulator from a program, the source program would have string constants for control sequences or parts thereof, just like done now.
> >
> > For a styling enhanced plain text editor one should be able to select a text portion, and then use a menu or keyboard shortcut to select a styling, as it is done in just about any modern text editor. There is no need for an end user to see the styling codes. Using something like HTML syntax would have terrible consequences in that it is hard to tell content from controls. For HTML for instance one MUST use < for <, so that it is not taken as start of a “tag”. That is absolutely nothing you want to see for a terminal, nor for a styling enhanced plain text editor.
> 
> I dislike this part, and I think it is the main problem.
> 
> Note: I'm actively fighting the use of "string constants" for CSI, in
> programs. Note: maybe we have a different interpretation.
> 
> For emulators I want that they uses libraries or at least they check
> terminal capabilities and they issues formatting codes (CSI, from
> ECMA-48 or common usage which are de-facto standards).
> 
> Every terminal emulator is different, and users want to use it also
> differently (so changing the settings). A programmer should not make a
> choice for me.
> 
> Do a program want to print on console? I'm ok that it may write some
> warnings in colours (but often they fails: they assume a background
> colour (and please: it is my choice!)), if I want to write to a log
> file, no CSI codes.
> 
> Hard coded formatting code are bad (and BTW html strongly discourage
> them, for reason: we learn from past).
> 
> And now I stop with the first rant.
> 
> 
> HTML (and LaTeX) can format text according the medium, and HTML is
> responsive. I find no good way to do it with ECMA-48 style. We can ask
> the size of the screen, or get a signal when it changes, but there is
> not real support on emulators: rendering is performed by programs (e.g.
> using dialog, or directly with curses library). Could you find a good
> way to display in a sensible way tables with different terminal widths
> (starting from 40 or less columns?). It is not code we want in most (or
> any) terminal emulator.
> 
> But also in an editor...I feel that programmers must transform it in
> html/css, do the rendering with existing libraries (which they are
> huge), and render it as text + CSI.
> 
> 
> What problem are you solving? Real case problem. The more I look the
> proposal, the more I think other tools are much easier and simpler.
> 
> Note: HTML with years solved many problems (also considering colour
> blind people, printing, etc.). Note: HTML as technology, not what we got
> from web (but so, possibly you should implement your proposal in that
> way: you just convert CSI to html (DOM), and lets' display it): so we
> have a real case to look. (and there are already libraries that do it,
> but without your extension proposal).
> 
> Your proposal is in any case doesn't maintain plain text: CSI sequences
> have punctuation, letters and numbers. So there is no much differences
> of text in elements and tags in HTML: a program/person which want the
> plain text, e.g. for copy/past, must do a lot of work removing
> formatting. In modern html is easy.
> 
> 
> I find it would have been nice idea if we were in 1990s (and so an
> alternative of HTML), but now we have good designs, so do not let's to
> duplicate the huge work HTML did in past). For a practical point (if I
> need to implement it): just a filter to a DOM engine (which at the end
> would be a subset of existing HTML engines) and a rendering (which trend
> go in direction of HTML like formatting API for different graphical
> environment).
> 
> 
> And in any case, you should start at a higher layer: show programs, and
> if it is useful emulators and editors will adopt it. Or like tmux (so a
> sort of filter, and IIRC in past some *extensions*, e.g. the UTF-8 where
> done first as filter between user and terminal emulator.
> 
> cate
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240111/64cb732f/attachment.htm>


More information about the Unicode mailing list