Ecma-48 proposed styling controls update updated & math expression representation proposal update

Giacomo Catenazzi cate at cateee.net
Mon Jan 15 01:58:57 CST 2024


Hello,

Reading this discussion, I can with some conclusions:

The proposal should be split in 2 parts:

- Part I, point a):

We need to document (and "standardize") current usage of CSI (and other 
ECMA-48-like parts). Common usage, nothing new. At least one can 
recommend best way in case of duplicates, or *deprecate* something not 
much used. I do not see how to go forward without such fundament. And we 
may see that some parts were already implemented (remember: Unicode is 
not about designing something new: we have a lot of historical 
backpacks). Note: on such part I could help

- Part I, point b):

Then we can try to extend the commands, for **terminal emulators**. The 
part a) would help to create a solid base, and possibly to get some/many 
terminal emulator maintainer on the group. It is not awful to 
standardize something without support and help of people which should 
use the standard. And ECMA-48 is used on such case: *terminal emulators*

- Part II):

Extend outside terminal emulator. Which I find the most problematic 
part: we are creating a document "sub-standard" (so by design not good) 
for a thing we already have well used standards (maybe just define a 
subset of html). And in any case Part I and Part II are on different 
layer then common Unicode (a part of bidi and ruby which are often 
mentioned, but also can be standardized and it is also done so, in a 
different layer). Again: it is a different layer, so we need a parser, 
and so html of ECMA-48 is irrelevant (but one is robust, and widely used).

In any case I do not find useful to discuss to this second part if we do 
not have the first part done. And also we need to have people which want 
to implement it on own programs: we already have red text in ECMA-48, 
but which program support copying maintaining the formatting? Very few 
(discarding where just bytes are copied, so copy at different layer).


Why we need a new standard where we have already good one? Transcoding 
is bread-and-butter (and ECMA provides us a lot of *private* space if we 
need to encode formatting on such way).


I see just vapour: no use case, no interest on programmers to adapt own 
programs, so for now it will bloat technical documents without use. I 
dislike the *in future may be used* in the discussions. I want a 
reference implementation to check if things can be useful (and if it can 
be coded), and support from developers of some used program.


Note: do not use Unicode standard as an argument: we have a lot of awful 
parts (fortunately hidden and forgotten). IIRC also for simple 
formatting text. Bidi in Unicode is not a reason to do it again, or to 
implement such layers again in Unicode.


giacomo



On 15 Jan 2024 01:37, Kent Karlsson via Unicode wrote:
> 
> (Second reply to same email)
> 
>> 13 jan. 2024 kl. 01:00 skrev Asmus Freytag via Unicode 
>> <unicode at corp.unicode.org>:
>> 
>> ECMA-48 is not plain text. It is a form of markup that uses syntax 
>> characters other than those from the printable ASCII range, but that's 
>> about the only distinction.
> 
> But that is a key distinction.
> 
> How do you think of Unicode bidi controls? Plain text or not? They are 
> at the same “level” as ECMA-48 controls!
> 
> Speaking of bidi, that has major security issues very similar to those 
> pointed out for ECMA-48 in a reference given in this thread. For source 
> code and math expressions it must be strongly restricted as pointed out 
> in my two proposals, if at all permitted.
> 
>> It's different from a true binary format as well, which would use 
>> things like addresses and lengths to mark the location of text runs 
>> and styling info. Instead, like any other markup, it uses character 
>> codes inserted into the data stream.
> 
> Yes, of course.
> 
> (While uncommon as an external representation, the Teletext protocol, in 
> higher implementation levels, does have an addressing based (i.e. 
> out-of-line) representation for some formatting extensions, like 
> additional colours and bold/italics/proportional.)
> 
>> Now that we have that out of the way, let's look at the clipboard.
>>
>> The clipboard contains both data and metadata. By telling a recipient 
>> that data is in HTML format it can be displayed as rich text, instead 
>> of as HTML source. The same is true for rtf or ECMA-48.
> 
> While I am not super-knowledgeable about clipboards, I gather that at 
> least one type uses a form of limited HTML as a passe-partout for 
> formatted text, regardless of source and target of copy-paste or the 
> file representations they might support. And that is fine.
> 
>> The same data can be present in multiple formats on the clipboard. 
>> That's what's behind the ability to paste "just the text" from a 
>> copied section, discarding the styling.
>>
>> Logically, for that to work, either the sender or the recipient of the 
>> clipboard data must understand what the "just the text" part of the 
>> data represents and how to discard the styling.
> 
> I gather that it is the sender that fills in some of the available 
> alternatives. For instance it can fill in the HTML slot and the “plain 
> text” slot.
> 
> I don’t think an ECMA-48 slot would be helpful.
> 
> Still it should be, *and is already*, possible to copy-paste styled text 
> from a terminal emulator to (say) a Word document (neither of which use 
> HTML). (Barring bugs and other imperfections.)
> 
>> It's been too long, but from what I remember, it was the sender that 
>> had the option of offering multiple formats and the recipient could 
>> pick any that it understood.
> 
> Yes. Some applications allow the end user to pick which one.
> 
>> That's the only logical approach, because only the sender can be 
>> assumed to know the format the data is in. The receiver could do 
>> post-processing only on data formats already known to it.
>>
>> Your ECMA-48 terminal app 
> 
> •I• make/maintain no terminal emulator. I just use some (essentially 
> every work-day).
> 
>> would presumably want to offer both the ECMA-48 stream with suitable 
>> metadata defining it as such, as well a plain-text stream, which 
>> discards the styling. 
> 
> HTML + plain text in the clip board. Many only provide plain text at 
> this time. But that may change.
> 
> /Kent K
> 
>> For nested styling syntax I don't know whether sending applications 
>> would perform an "auto close" of any open styling commands when 
>> packaging up the selected text, or whether that would be done by the 
>> receiving app, assuming it understands the format. The problem how to 
>> handle selection at the boundary of a style run when the style 
>> commands themselves are not visible to the user is the same for markup 
>> languages as for ECMA-48.
>>
>> Nothing new to see here, move right along.
>>
>> A./
>>
>> On 1/12/2024 3:26 PM, Marius Spix via Unicode wrote:
>>> Applications like Word or web browsers are able to preserve formatting
>>> by using rich text formats like HTML or RTF in the clipboard. ECMA-48
>>> proposed styling controls work on the plaintext layer, independenlty
>>> from the application, as long the renderer (e. g. Uniscribe or HarfBuzz)
>>> supports them. That would require the clipboard handler of the
>>> operating system to be aware of these sequences.
>>>
>>>
>>> Am Fri, 12 Jan 2024 22:08:19 +0000
>>> schrieb Doug Ewell<doug at ewellic.org>:
>>>
>>>> Eli Zaretskii wrote:
>>>>
>>>>> Sorry, I'm probably missing something, because I don't see the
>>>>> relevance.  My point is that copy/paste through the clipboard uses
>>>>> formats that are not plain text, and encode the styles and typefaces
>>>>> by using methods that are not compatible with plain text.
>>>> I think Marius will have to address what he meant, as you and I are
>>>> talking past each other.
>>>>
>>>> If ECMA-48 markup is part of the plain-text stream, and it is copied
>>>> from one app to another in a plain-text Clipboard, then all of the
>>>> ECMA-48 sequences should survive the transit.
>>>>
>>>>>> Alternatively, why is the stated user-experience problem for
>>>>>> ECMA-48 not a problem for Word?
>>>>> I thought I answered that?  Or what do you mean by "user
>>>>> experience"?
>>>> That question was semi-rhetorical, and was for Marius, who again will
>>>> need to respond. I thought he was talking about the human user trying
>>>> to select text to be copied, and inadvertently failing to select a
>>>> starting or ending ECMA-48 sequence because they are not
>>>> human-visible.
>>>>
>>>>> If pasting between applications, the answer is again clipboard
>>>>> format that is not plain text.  If you copy plain text, the
>>>>> formatting is lost.
>>>> Wait: are we saying that ECMA-48 sequences like CSI 31m are plain
>>>> text, or that they are not?
>>>>
>>>> --
>>>> Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org
>>>>
>>


More information about the Unicode mailing list