From pgcon6 at msn.com  Tue Jan  2 11:46:16 2024
From: pgcon6 at msn.com (Peter Constable)
Date: Tue, 2 Jan 2024 17:46:16 +0000
Subject: UDHR in Unicode
In-Reply-To: <8c9926c0-265a-4114-b930-de22ed21902b@code2001.com>
References: <SJ0PR03MB65988A3534C3C0E29D468854CA90A@SJ0PR03MB6598.namprd03.prod.outlook.com>
 <DS0PR12MB753583694E17EB691BD47E038697A@DS0PR12MB7535.namprd12.prod.outlook.com>
 <8c9926c0-265a-4114-b930-de22ed21902b@code2001.com>
Message-ID: <DS0PR12MB753509A716DB38E8E04868C78661A@DS0PR12MB7535.namprd12.prod.outlook.com>

Happy 2024!

Some in this thread were jumping to unwarranted conclusions. (Nothing will be deleted.) The UDHR project will be taken over by Eric Muller, who was the one that started it. The Web content and git repo will be moved to a domain he owns.


Peter Constable


From marius.spix at web.de  Wed Jan  3 10:09:14 2024
From: marius.spix at web.de (Marius Spix)
Date: Wed, 3 Jan 2024 17:09:14 +0100
Subject: Reference glyphs of musical accidentals quarter sharp and quarter flat
Message-ID: <20240103170914.18ab5460@spixxi>

Hi,

I just noted that the reference glyphs for

U+1D132 MUSICAL SYMBOL QUARTER TONE SHARP

and

U+1D133 MUSICAL SYMBOL QUARTER TONE FLAT

on the code chart are very unusual. In the standard notation, the
quarter sharp is represented by U+266F with only one downstroke and the
quarter flat by a mirrored version of U+266D MUSIC FLAT SIGN (or as a
variant of U+266D with a stroke). Please find the attached image for
reference.

I had a look at the mailing list and there was already a suggestion by
Johnny Farraj in 2015, by Markus Scherer in 2018 and by Gavin Jared Bala
and Kirk Miller in 2023 (request L2/23-276). The letter also includes
the currently missing characters for three-quarter sharp and
three-quarter flat, two characters I also see an urgent need for.

Howerver, in contrast to that request, I propose to unify two suggested
characters with existing ones and change the reference glyph instead of
encoding a new character instead.

U+1D1ED MUSICAL SYMBOL REVERSED FLAT (requested) = U+1D133 MUSICAL
SYMBOL QUARTER TONE FLAT (existing)
U+1D1EB MUSICAL SYMBOL HALF SHARP (requested) = U+1D133 MUSICAL SYMBOL
QUARTER TONE FLAT (existing)

The stroked variant of the quarter flat (which does not appear in
the proposal of Gavin Jared Bala and Kirk Miller, but can be found in
several pieces) could be obtained by combining U+1D132 MUSICAL SYMBOL
QUARTER TONE SHARP with a variation selector (e. g. U+FE00).

What do you think?

Best regards,

Marius
-------------- next part --------------
A non-text attachment was scrubbed...
Name: accidentals.png
Type: image/png
Size: 31313 bytes
Desc: not available
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240103/01faaf89/attachment-0001.png>

From marius.spix at web.de  Wed Jan  3 10:15:01 2024
From: marius.spix at web.de (Marius Spix)
Date: Wed, 3 Jan 2024 17:15:01 +0100
Subject: Reference glyphs of musical accidentals quarter sharp and
 quarter flat
In-Reply-To: <20240103170914.18ab5460@spixxi>
References: <20240103170914.18ab5460@spixxi>
Message-ID: <20240103171501.1384807b@spixxi>

Hi,

the attachment in my previous message contained a small mistake. I
fixed this in this followup.

Best regards,

Marius


Am Wed, 3 Jan 2024 17:09:14 +0100
schrieb Marius Spix <marius.spix at web.de>:

> Hi,
>
> I just noted that the reference glyphs for
>
> U+1D132 MUSICAL SYMBOL QUARTER TONE SHARP
>
> and
>
> U+1D133 MUSICAL SYMBOL QUARTER TONE FLAT
>
> on the code chart are very unusual. In the standard notation, the
> quarter sharp is represented by U+266F with only one downstroke and
> the quarter flat by a mirrored version of U+266D MUSIC FLAT SIGN (or
> as a variant of U+266D with a stroke). Please find the attached image
> for reference.
>
> I had a look at the mailing list and there was already a suggestion by
> Johnny Farraj in 2015, by Markus Scherer in 2018 and by Gavin Jared
> Bala and Kirk Miller in 2023 (request L2/23-276). The letter also
> includes the currently missing characters for three-quarter sharp and
> three-quarter flat, two characters I also see an urgent need for.
>
> Howerver, in contrast to that request, I propose to unify two
> suggested characters with existing ones and change the reference
> glyph instead of encoding a new character instead.
>
> U+1D1ED MUSICAL SYMBOL REVERSED FLAT (requested) = U+1D133 MUSICAL
> SYMBOL QUARTER TONE FLAT (existing)
> U+1D1EB MUSICAL SYMBOL HALF SHARP (requested) = U+1D133 MUSICAL SYMBOL
> QUARTER TONE FLAT (existing)
>
> The stroked variant of the quarter flat (which does not appear in
> the proposal of Gavin Jared Bala and Kirk Miller, but can be found in
> several pieces) could be obtained by combining U+1D132 MUSICAL SYMBOL
> QUARTER TONE SHARP with a variation selector (e. g. U+FE00).
>
> What do you think?
>
> Best regards,
>
> Marius

-------------- next part --------------
A non-text attachment was scrubbed...
Name: accidentals.png
Type: image/png
Size: 31613 bytes
Desc: not available
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240103/095c52ab/attachment-0001.png>

From doug at ewellic.org  Wed Jan  3 11:11:33 2024
From: doug at ewellic.org (Doug Ewell)
Date: Wed, 3 Jan 2024 17:11:33 +0000
Subject: Reference glyphs of musical accidentals quarter sharp and quarter
 flat
In-Reply-To: <20240103170914.18ab5460@spixxi>
References: <20240103170914.18ab5460@spixxi>
Message-ID: <SJ0PR03MB65989748FB047297942D93C2CA60A@SJ0PR03MB6598.namprd03.prod.outlook.com>

I personally support Kirk Miller's proposal to add the more commonly used symbols as separate characters, rather than complicating the encoding by adding variation selectors to change the glyph to something quite different.

We don't always realize it, but ordinary users generally don't know anything about variation selectors.

?Doug


Sent via the Samsung Galaxy S22 Ultra 5G, an AT&T 5G smartphone
Get Outlook for Android<https://aka.ms/AAb9ysg>
________________________________
From: Unicode <unicode-bounces at corp.unicode.org> on behalf of Marius Spix via Unicode <unicode at corp.unicode.org>
Sent: Wednesday, January 3, 2024 9:09:14 AM
To: unicode at corp.unicode.org <unicode at corp.unicode.org>
Subject: Reference glyphs of musical accidentals quarter sharp and quarter flat

Hi,

I just noted that the reference glyphs for

U+1D132 MUSICAL SYMBOL QUARTER TONE SHARP

and

U+1D133 MUSICAL SYMBOL QUARTER TONE FLAT

on the code chart are very unusual. In the standard notation, the
quarter sharp is represented by U+266F with only one downstroke and the
quarter flat by a mirrored version of U+266D MUSIC FLAT SIGN (or as a
variant of U+266D with a stroke). Please find the attached image for
reference.

I had a look at the mailing list and there was already a suggestion by
Johnny Farraj in 2015, by Markus Scherer in 2018 and by Gavin Jared Bala
and Kirk Miller in 2023 (request L2/23-276). The letter also includes
the currently missing characters for three-quarter sharp and
three-quarter flat, two characters I also see an urgent need for.

Howerver, in contrast to that request, I propose to unify two suggested
characters with existing ones and change the reference glyph instead of
encoding a new character instead.

U+1D1ED MUSICAL SYMBOL REVERSED FLAT (requested) = U+1D133 MUSICAL
SYMBOL QUARTER TONE FLAT (existing)
U+1D1EB MUSICAL SYMBOL HALF SHARP (requested) = U+1D133 MUSICAL SYMBOL
QUARTER TONE FLAT (existing)

The stroked variant of the quarter flat (which does not appear in
the proposal of Gavin Jared Bala and Kirk Miller, but can be found in
several pieces) could be obtained by combining U+1D132 MUSICAL SYMBOL
QUARTER TONE SHARP with a variation selector (e. g. U+FE00).

What do you think?

Best regards,

Marius
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240103/2814f8d0/attachment.htm>

From gwalla at gmail.com  Wed Jan  3 12:13:40 2024
From: gwalla at gmail.com (Garth Wallace)
Date: Wed, 3 Jan 2024 10:13:40 -0800
Subject: Reference glyphs of musical accidentals quarter sharp and quarter
 flat
In-Reply-To: <SJ0PR03MB65989748FB047297942D93C2CA60A@SJ0PR03MB6598.namprd03.prod.outlook.com>
References: <20240103170914.18ab5460@spixxi>
 <SJ0PR03MB65989748FB047297942D93C2CA60A@SJ0PR03MB6598.namprd03.prod.outlook.com>
Message-ID: <CA+p4_H1m+PxAn8TBL6qYQBbrWo-DM_drfsnMEgxXbijG8k5BOQ@mail.gmail.com>

Also, the semantics of the half-sharp, reversed flat, and slashed flat gets
a little more complicated when you take Turkishmusic into account. Probably
best to encode new characters (and a proposal was submitted fairly
recently) even if it means leaving some detritus in the Standard. It
wouldn?t be the first time.

On Wed, Jan 3, 2024 at 9:17?AM Doug Ewell via Unicode <
unicode at corp.unicode.org> wrote:

> I personally support Kirk Miller's proposal to add the more commonly used
> symbols as separate characters, rather than complicating the encoding by
> adding variation selectors to change the glyph to something quite different.
>
> We don't always realize it, but ordinary users generally don't know
> anything about variation selectors.
>
> ?Doug
>
>
> Sent via the Samsung Galaxy S22 Ultra 5G, an AT&T 5G smartphone
> Get Outlook for Android <https://aka.ms/AAb9ysg>
> ------------------------------
> *From:* Unicode <unicode-bounces at corp.unicode.org> on behalf of Marius
> Spix via Unicode <unicode at corp.unicode.org>
> *Sent:* Wednesday, January 3, 2024 9:09:14 AM
> *To:* unicode at corp.unicode.org <unicode at corp.unicode.org>
> *Subject:* Reference glyphs of musical accidentals quarter sharp and
> quarter flat
>
> Hi,
>
> I just noted that the reference glyphs for
>
> U+1D132 MUSICAL SYMBOL QUARTER TONE SHARP
>
> and
>
> U+1D133 MUSICAL SYMBOL QUARTER TONE FLAT
>
> on the code chart are very unusual. In the standard notation, the
> quarter sharp is represented by U+266F with only one downstroke and the
> quarter flat by a mirrored version of U+266D MUSIC FLAT SIGN (or as a
> variant of U+266D with a stroke). Please find the attached image for
> reference.
>
> I had a look at the mailing list and there was already a suggestion by
> Johnny Farraj in 2015, by Markus Scherer in 2018 and by Gavin Jared Bala
> and Kirk Miller in 2023 (request L2/23-276). The letter also includes
> the currently missing characters for three-quarter sharp and
> three-quarter flat, two characters I also see an urgent need for.
>
> Howerver, in contrast to that request, I propose to unify two suggested
> characters with existing ones and change the reference glyph instead of
> encoding a new character instead.
>
> U+1D1ED MUSICAL SYMBOL REVERSED FLAT (requested) = U+1D133 MUSICAL
> SYMBOL QUARTER TONE FLAT (existing)
> U+1D1EB MUSICAL SYMBOL HALF SHARP (requested) = U+1D133 MUSICAL SYMBOL
> QUARTER TONE FLAT (existing)
>
> The stroked variant of the quarter flat (which does not appear in
> the proposal of Gavin Jared Bala and Kirk Miller, but can be found in
> several pieces) could be obtained by combining U+1D132 MUSICAL SYMBOL
> QUARTER TONE SHARP with a variation selector (e. g. U+FE00).
>
> What do you think?
>
> Best regards,
>
> Marius
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240103/e2b31883/attachment.htm>

From haberg-1 at telia.com  Wed Jan  3 14:42:30 2024
From: haberg-1 at telia.com (=?utf-8?Q?Hans_=C3=85berg?=)
Date: Wed, 3 Jan 2024 21:42:30 +0100
Subject: Reference glyphs of musical accidentals quarter sharp and quarter
 flat
In-Reply-To: <CA+p4_H1m+PxAn8TBL6qYQBbrWo-DM_drfsnMEgxXbijG8k5BOQ@mail.gmail.com>
References: <20240103170914.18ab5460@spixxi>
 <SJ0PR03MB65989748FB047297942D93C2CA60A@SJ0PR03MB6598.namprd03.prod.outlook.com>
 <CA+p4_H1m+PxAn8TBL6qYQBbrWo-DM_drfsnMEgxXbijG8k5BOQ@mail.gmail.com>
Message-ID: <72DBCD09-986C-42AE-9488-D689E24C5318@telia.com>

SMuFL <https://www.smufl.org/> is working on a musical symbols standard; also see:
https://en.wikipedia.org/wiki/SMuFL


> On Jan 3, 2024, at 19:13, Garth Wallace via Unicode <unicode at corp.unicode.org> wrote:
> 
> Also, the semantics of the half-sharp, reversed flat, and slashed flat gets a little more complicated when you take Turkishmusic into account. Probably best to encode new characters (and a proposal was submitted fairly recently) even if it means leaving some detritus in the Standard. It wouldn?t be the first time.
> 
> On Wed, Jan 3, 2024 at 9:17?AM Doug Ewell via Unicode <unicode at corp.unicode.org> wrote:
> I personally support Kirk Miller's proposal to add the more commonly used symbols as separate characters, rather than complicating the encoding by adding variation selectors to change the glyph to something quite different.
> 
> We don't always realize it, but ordinary users generally don't know anything about variation selectors.
> 
> ?Doug
> 


From asmusf at ix.netcom.com  Wed Jan  3 15:31:37 2024
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Wed, 3 Jan 2024 13:31:37 -0800
Subject: Reference glyphs of musical accidentals quarter sharp and quarter
 flat
In-Reply-To: <SJ0PR03MB65989748FB047297942D93C2CA60A@SJ0PR03MB6598.namprd03.prod.outlook.com>
References: <20240103170914.18ab5460@spixxi>
 <SJ0PR03MB65989748FB047297942D93C2CA60A@SJ0PR03MB6598.namprd03.prod.outlook.com>
Message-ID: <e4422f93-d381-49ec-ba49-6db0f7521f05@ix.netcom.com>

It's not only about whether users know what a variation selector is or 
is not.

Use of a variation selector for two shapes implies that they are the 
*same symbol*. For something like slanted vs. upright integrals, it's 
easy to assert that they are the same symbol and not two symbols that 
represent the same concept. Hence the use of variation selectors.

For other concepts, there are often multiple symbols representing them. 
Pound, for example, is abbreviated with a ligated lb that has it's own 
character code, but also with # on occasion. It would have been 
exceedingly disingenuous to make the lb ligature a variant of the #.

Just because some concepts map 1:1 to symbols it's not a sufficient 
condition for suggesting the use of variation selector. Anytime where 
symbol shapes show significant deviation, the presumption ought to be 
that we are looking at two symbols for the same concept, which both 
deserve to be encoded.

A./

On 1/3/2024 9:11 AM, Doug Ewell via Unicode wrote:
> I personally support Kirk Miller's proposal to add the more commonly 
> used symbols as separate characters, rather than complicating the 
> encoding by adding variation selectors to change the glyph to 
> something quite different.
>
> We don't always realize it, but ordinary users generally don't know 
> anything about variation selectors.
>
> ?Doug
>
>
> Sent via the Samsung Galaxy S22 Ultra 5G, an AT&T 5G smartphone
> Get Outlook for Android <https://aka.ms/AAb9ysg>
> ------------------------------------------------------------------------
> *From:* Unicode <unicode-bounces at corp.unicode.org> on behalf of Marius 
> Spix via Unicode <unicode at corp.unicode.org>
> *Sent:* Wednesday, January 3, 2024 9:09:14 AM
> *To:* unicode at corp.unicode.org <unicode at corp.unicode.org>
> *Subject:* Reference glyphs of musical accidentals quarter sharp and 
> quarter flat
> Hi,
>
> I just noted that the reference glyphs for
>
> U+1D132 MUSICAL SYMBOL QUARTER TONE SHARP
>
> and
>
> U+1D133 MUSICAL SYMBOL QUARTER TONE FLAT
>
> on the code chart are very unusual. In the standard notation, the
> quarter sharp is represented by U+266F with only one downstroke and the
> quarter flat by a mirrored version of U+266D MUSIC FLAT SIGN (or as a
> variant of U+266D with a stroke). Please find the attached image for
> reference.
>
> I had a look at the mailing list and there was already a suggestion by
> Johnny Farraj in 2015, by Markus Scherer in 2018 and by Gavin Jared Bala
> and Kirk Miller in 2023 (request L2/23-276). The letter also includes
> the currently missing characters for three-quarter sharp and
> three-quarter flat, two characters I also see an urgent need for.
>
> Howerver, in contrast to that request, I propose to unify two suggested
> characters with existing ones and change the reference glyph instead of
> encoding a new character instead.
>
> U+1D1ED MUSICAL SYMBOL REVERSED FLAT (requested) = U+1D133 MUSICAL
> SYMBOL QUARTER TONE FLAT (existing)
> U+1D1EB MUSICAL SYMBOL HALF SHARP (requested) = U+1D133 MUSICAL SYMBOL
> QUARTER TONE FLAT (existing)
>
> The stroked variant of the quarter flat (which does not appear in
> the proposal of Gavin Jared Bala and Kirk Miller, but can be found in
> several pieces) could be obtained by combining U+1D132 MUSICAL SYMBOL
> QUARTER TONE SHARP with a variation selector (e. g. U+FE00).
>
> What do you think?
>
> Best regards,
>
> Marius

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240103/653e79d7/attachment.htm>

From gwalla at gmail.com  Wed Jan  3 17:07:56 2024
From: gwalla at gmail.com (Garth Wallace)
Date: Wed, 3 Jan 2024 15:07:56 -0800
Subject: Reference glyphs of musical accidentals quarter sharp and quarter
 flat
In-Reply-To: <e4422f93-d381-49ec-ba49-6db0f7521f05@ix.netcom.com>
References: <20240103170914.18ab5460@spixxi>
 <SJ0PR03MB65989748FB047297942D93C2CA60A@SJ0PR03MB6598.namprd03.prod.outlook.com>
 <e4422f93-d381-49ec-ba49-6db0f7521f05@ix.netcom.com>
Message-ID: <CA+p4_H1fKrp9RDBm-X9sfrFo+TGpe7u5sXsROU7Q_wav3voxXw@mail.gmail.com>

And it would pretty odd to call a real symbol a ?variation? of one that has
never been in use.

On Wed, Jan 3, 2024 at 1:34?PM Asmus Freytag via Unicode <
unicode at corp.unicode.org> wrote:

> It's not only about whether users know what a variation selector is or is
> not.
>
> Use of a variation selector for two shapes implies that they are the *same
> symbol*. For something like slanted vs. upright integrals, it's easy to
> assert that they are the same symbol and not two symbols that represent the
> same concept. Hence the use of variation selectors.
>
> For other concepts, there are often multiple symbols representing them.
> Pound, for example, is abbreviated with a ligated lb that has it's own
> character code, but also with # on occasion. It would have been exceedingly
> disingenuous to make the lb ligature a variant of the #.
>
> Just because some concepts map 1:1 to symbols it's not a sufficient
> condition for suggesting the use of variation selector. Anytime where
> symbol shapes show significant deviation, the presumption ought to be that
> we are looking at two symbols for the same concept, which both deserve to
> be encoded.
>
> A./
>
> On 1/3/2024 9:11 AM, Doug Ewell via Unicode wrote:
>
> I personally support Kirk Miller's proposal to add the more commonly used
> symbols as separate characters, rather than complicating the encoding by
> adding variation selectors to change the glyph to something quite different.
>
> We don't always realize it, but ordinary users generally don't know
> anything about variation selectors.
>
> ?Doug
>
>
> Sent via the Samsung Galaxy S22 Ultra 5G, an AT&T 5G smartphone
> Get Outlook for Android <https://aka.ms/AAb9ysg>
> ------------------------------
> *From:* Unicode <unicode-bounces at corp.unicode.org>
> <unicode-bounces at corp.unicode.org> on behalf of Marius Spix via Unicode
> <unicode at corp.unicode.org> <unicode at corp.unicode.org>
> *Sent:* Wednesday, January 3, 2024 9:09:14 AM
> *To:* unicode at corp.unicode.org <unicode at corp.unicode.org>
> <unicode at corp.unicode.org>
> *Subject:* Reference glyphs of musical accidentals quarter sharp and
> quarter flat
>
> Hi,
>
> I just noted that the reference glyphs for
>
> U+1D132 MUSICAL SYMBOL QUARTER TONE SHARP
>
> and
>
> U+1D133 MUSICAL SYMBOL QUARTER TONE FLAT
>
> on the code chart are very unusual. In the standard notation, the
> quarter sharp is represented by U+266F with only one downstroke and the
> quarter flat by a mirrored version of U+266D MUSIC FLAT SIGN (or as a
> variant of U+266D with a stroke). Please find the attached image for
> reference.
>
> I had a look at the mailing list and there was already a suggestion by
> Johnny Farraj in 2015, by Markus Scherer in 2018 and by Gavin Jared Bala
> and Kirk Miller in 2023 (request L2/23-276). The letter also includes
> the currently missing characters for three-quarter sharp and
> three-quarter flat, two characters I also see an urgent need for.
>
> Howerver, in contrast to that request, I propose to unify two suggested
> characters with existing ones and change the reference glyph instead of
> encoding a new character instead.
>
> U+1D1ED MUSICAL SYMBOL REVERSED FLAT (requested) = U+1D133 MUSICAL
> SYMBOL QUARTER TONE FLAT (existing)
> U+1D1EB MUSICAL SYMBOL HALF SHARP (requested) = U+1D133 MUSICAL SYMBOL
> QUARTER TONE FLAT (existing)
>
> The stroked variant of the quarter flat (which does not appear in
> the proposal of Gavin Jared Bala and Kirk Miller, but can be found in
> several pieces) could be obtained by combining U+1D132 MUSICAL SYMBOL
> QUARTER TONE SHARP with a variation selector (e. g. U+FE00).
>
> What do you think?
>
> Best regards,
>
> Marius
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240103/905ed65a/attachment-0001.htm>

From kent.b.karlsson at bahnhof.se  Fri Jan  5 03:27:50 2024
From: kent.b.karlsson at bahnhof.se (Kent Karlsson)
Date: Fri, 5 Jan 2024 10:27:50 +0100
Subject: Ecma-48 proposed styling controls update updated & math expression
 representation proposal update 
Message-ID: <A3C7F897-7CF9-49A9-ACC9-17B29069EDDC@bahnhof.se>


I've done an update to the ECMA-48 styling: proposed update.
https://github.com/kent-karlsson/control/blob/main/ecma-48-style-modernisation-2024.pdf
 
The major updates are:
 
1) Additional bidi modes: KISS (eliminating the odd tweeks),
   KISS2 (also eliminate the nestable bidi controls), and
   a mode where only "pure RTL runs in same script" are bidi
   reversed. The last mode (though not necessarily the control
   sequence) is suitable for editors of program *source code*
   (C++, Perl, PHP, bash, Java, ...) and data *source code*
   (HTML, XML, CSV, ..., custom). The "source code mode" is
   similar to how bidi must be handled in math expressions.
   
2) Variants for having arrow directions follow the text layout
   order (mirroring, rotation) which is given as default, and a
   temporary variant (end at bidi B or bidi S char) for not to
   mirror nor rotate arrows (arrows refer to external directions).
   (Note that mirroring and rotation data in Unicode take different
   approaches here; default (as yet) in Unicode for mirroring is
   that arrows refer to external (to the text) directions, whereas
   rotation (for CJK vertical) is that arrows refer to the text
   itself.
   
3) In Annex B, the wretched and totally unhelpful general category
   Cc is completely overridden by more useful general categories;
   similarly for bidi and line break properties.
   (Yes, "default-ignorable" should also be, well, ignored.)
 
As hinted in point 2 above, arrow mirroring data
(https://www.unicode.org/L2/L2022/22026r-non-bidi-mirroring.pdf)
would be helpful... It was called "non-bidi" because at the time
it was intended only for math expressions, as an edit support, there
must be no automatic (non-explicit) mirroring in math expressions
or source code. But as you see in point 2, I now think it should
be used also for bidi automatic mirroring; note again that automatic
(implicit) bidi mirroring does not apply at all to math expressions,
nor to source code.
 
 
I've also done an update to the (independent!) math expression
representation proposal. 
https://github.com/kent-karlsson/control/blob/main/math-layout-controls-2024.pdf
Minor updates, but I included the data in 
https://www.unicode.org/L2/L2022/22026r-non-bidi-mirroring.pdf
in an annex; I don't know what is happening, if anything, to that
poposal.
 
/Kent Karlsson

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240105/41fa21e1/attachment.htm>

From steffen at sdaoden.eu  Fri Jan  5 16:01:58 2024
From: steffen at sdaoden.eu (Steffen Nurpmeso)
Date: Fri, 05 Jan 2024 23:01:58 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <A3C7F897-7CF9-49A9-ACC9-17B29069EDDC@bahnhof.se>
References: <A3C7F897-7CF9-49A9-ACC9-17B29069EDDC@bahnhof.se>
Message-ID: <20240105220158.gYaCFpUL@steffen%sdaoden.eu>

Kent Karlsson via Unicode wrote in
 <A3C7F897-7CF9-49A9-ACC9-17B29069EDDC at bahnhof.se>:
 |I've done an update to the ECMA-48 styling: proposed update.
 |https://github.com/kent-karlsson/control/blob/main/ecma-48-style-moderni\
 |sation-2024.pdf
 | 
 |The major updates are:

Last i looked (i downloaded it on July 21st last year) the new
OSC-8 that is now in even wider use than last year (the new GNU
groff ships with native support for generating IDs, and the Linux
manual maintainer was eager to use that feature) was not
incorporated.  I'll attach it, maybe it is of interest.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
-------------- next part --------------
# Hyperlinks (a.k.a. HTML-like anchors) in terminal emulators

*[ Update 2020-05-31: I won't be maintaining this page or responding to comments anymore. The list of supporting software reflects the known state as of this date. ]*

---

Most of the terminal emulators auto-detect when a URL appears onscreen and allow to conveniently open them (e.g. via Ctrl+click or Cmd+click, or the right click menu).

It was, however, not possible until now for arbitrary text to point to URLs, just as on webpages.

In spring 2017, `GNOME Terminal` and `iTerm2` have changed this.

`GNOME Terminal` is based on the `VTE` widget, and almost all of this work went to `VTE`. As such, we expect other `VTE`-based terminal emulators to catch up and add support really soon. Other terminal emulators are also welcome and encouraged to join!

## Quick example

Here's a simple command to try out the feature. The result is equivalent to this HTML link: [This is a link](http://example.com)

```
printf '\e]8;;http://example.com\e\\This is a link\e]8;;\e\\\n'
```

## Supporting apps

### Terminal emulators
- [DomTerm](https://domterm.org/) 1.0.2
- [hterm](https://chromium.googlesource.com/apps/libapps/+/master/hterm) 1.76
- [hyper](https://hyper.is/) since Oct 2019, version ???
- [iTerm2](http://iterm2.com/) 3.1
- [Terminology](https://www.enlightenment.org/about-terminology) in git since 2018-10-14, probably will be released in version 1.3
- [Ultimate++ terminal widget](https://github.com/ismail-yilmaz/upp-components/tree/master/CtrlLib/Terminal) since Nov 2019 (version ???)
- based on [VTE](https://wiki.gnome.org/Apps/Terminal/VTE) 0.50: <sup>(Use 0.50.4, 0.52.2, or newer to avoid a rare crash)</sup>
  - [GNOME Terminal](https://wiki.gnome.org/Apps/Terminal) 3.26
  - [Guake](http://guake-project.org/) 3.2.1
  - [ROXTerm](https://github.com/realh/roxterm) 3.5.1
  - [Tilix](https://github.com/gnunn1/tilix) 1.5.8
- [WezTerm](http://wezfurlong.org/wezterm/index.html) since early 2018

### Terminal Multiplexers
- [TermySequence](https://termysequence.io/)

### Apps
- [gcc](https://gcc.gnu.org/): Since version 10, for diagnostic messages to point to the documentation.
- `less -R`: Preliminary patch available in the GNOME Terminal discussion.
- `ls --hyperlink[=always/auto/never]` (`coreutils`): Since version 8.28.
- [Matterhorn](https://github.com/matterhorn-chat/matterhorn) chat client: Since version 40400.0.0.
- [mdcat](https://github.com/lunaryorn/mdcat) (markdown cat): Since version 0.5.0.
- [Symfony](https://symfony.com/): Since version 4.3.
- [systemd](https://github.com/systemd/systemd): Since version 239.
- [wget2](https://gitlab.com/gnuwget/wget2/): Since Nov 2019 (version ???).

### Libraries
- [vty](https://hackage.haskell.org/package/vty) medium-level terminal UI library: Since October 2017.
- [brick](https://hackage.haskell.org/package/brick) high-level terminal UI library: Since October 2017.
- [Rich](https://github.com/willmcgugan/rich) rich text formatting library: Since May 2020.

## Feature requests sent

### Terminal emulators
- [Alacritty](https://github.com/alacritty/alacritty/issues/922)
- [ConEmu](https://github.com/Maximus5/ConEmu/issues/2078)
- [Kitty](https://github.com/kovidgoyal/kitty/issues/68) <sup>(The Linux and macOS terminal emulator. Not to be confused with the Windows PuTTY-fork named `KiTTY`.)</sup>
- [Konsole](https://bugs.kde.org/show_bug.cgi?id=379294)
- `VTE`-based:
  - [LilyTerm](https://github.com/Tetralet/LilyTerm/issues/117)
  - [LXDE Terminal](https://sourceforge.net/p/lxde/bugs/870/)
  - [MATE Terminal](https://github.com/mate-desktop/mate-terminal/issues/175)
  - [Sakura](https://bugs.launchpad.net/sakura/+bug/1686823)
  - [Terminator](https://bugs.launchpad.net/terminator/+bug/1686821) <sup>(The one for Linux written in Python, based on GTK+. Not to be confused with the one written in Java bearing the same name.)</sup>
  - [Termit](https://github.com/nonstop/termit/issues/109)
  - [Termite](https://github.com/thestinger/termite/issues/476)
  - [Tilda](https://github.com/lanoxx/tilda/issues/285)
  - [Xfce Terminal](https://bugzilla.xfce.org/show_bug.cgi?id=13534)
- [Windows Terminal](https://github.com/microsoft/terminal/issues/204)
- [xterm.js](https://github.com/xtermjs/xterm.js/issues/1134)

### Apps
- [Irssi](https://github.com/irssi/irssi/issues/700)
- [less -R](https://github.com/gwsw/less/issues/54)
- [screen](https://savannah.gnu.org/bugs/index.php?50952)
- [tbvaccine](https://github.com/skorokithakis/tbvaccine/issues/37)
- [tmux](https://github.com/tmux/tmux/issues/911)
- [weechat](https://github.com/weechat/weechat/issues/1252)
- Planned to send request soon: vim, neovim, emacs, groff, find, grep.

## A few use cases

We have a couple of use cases in mind...

### apt-changelog

apt-changelog could automatically format bug IDs as links to the bugtracker's corresponding page.

### git log

git log, or other similar tools could make the commit IDs links to the corresponding page of a web frontend to the repo.

### viewers, editors

File viewers and editors could auto-detect URIs in the document, and convert them to hyperlinks even if they are only partially visible on the screen. Example screenshot from an imaginary text editor with two files opened:
```
?? file1 ?????
?          ?? file2 ????
?http://exa?Lorem ipsum?
?le.com    ? dolor sit ?
?          ?amet, conse?
????????????ctetur adip?
           ?????????????
```
Ctrl+clicking anywhere on `http://exa` or `le.com` could open the webpage `http://example.com`.

### core utilities

Core utilities, such as `ls`, `find` could optionally mark the printed files with their `file://...` URI, making it just one click to open in a graphical application.

### less -R

We're hoping to get `less -R` recognize and handle this escape sequence just as it does colors, so viewing the output of utilities piped to `less -R` would keep their hyperlinks working.

## The escape sequence

A hyperlink is opened upon encountering an OSC 8 escape sequence with the target URI. The syntax is

`OSC` `8` `;` `params` `;` `URI` `ST`

Following this, all subsequent cells that are painted are hyperlinks to this target. A hyperlink is closed with the same escape sequence, omitting the parameters and the URI but keeping the separators:

`OSC` `8` `;` `;` `ST`

`OSC` (operating system command) is typically `ESC` `]`.

`params` is an optional list of `key=value` assignments, separated by the `:` character. Example: `id=xyz123:foo=bar:baz=quux`. Currently only the `id` key is defined, see below. These parameters allow future extendability of this feature. In the typical case no parameters are defined, in that case obviously the two semicolons have to be present next to each other.

`URI` is the target of the hyperlink in URI-encoded form. Web addresses need to begin with `http://` or `https://`. Use `ftp://` for FTP, `file://` for local files (see below for the hostname), `mailto:` scheme for e-mail addresses, etc. It's up to the terminal emulator to decide what schemes it supports and which applications it launches for them.

The sequence is terminated with `ST` (string terminator) which is typically `ESC` `\`. (Although `ST` is the standard sequence according to ECMA-48 ?8.3.89, often the `BEL` (`\a`) character is used instead. This nonstandard choice originates from XTerm, and was later adopted by probably all terminal emulators to terminate `OSC` sequences. Nevertheless, we encourage the use of the standard `ST`.)

(For `OSC` and `ST`, their C0 variant was shown above. They have another, C1 form which might be supported in some contexts. In 8-bit Latin-X character sets they are the single bytes `0x9d` and `0x9c`, respectively. In UTF-8 mode some terminal emulators deliberately do not implement C1 support because these bytes would conflict with the UTF-8 encoding, while some other terminal emulators recognize the UTF-8 representation of `U+009d` (i.e. `0xc2` `0x9d`) and `U+009c` (i.e. `0xc2` `0x9c`), respectively. Since C1 is not universally supported in today's default UTF-8 encoding, its use is discouraged.)

## A note on opening/closing hyperlinks

The feature was modeled after anchors on webpages. There are some differences though, due to the nature of terminal emulation.

An HTML page is supposed contain balanced and unnested pairs of `<a ...>` and `</a>` tags. This is important in order to build up a DOM tree. Terminal emulators don't have this concept. They are a state machine, interpreting the data as it arrives in a stream.

As such, in terminal emulators an OSC 8 escape sequence just changes the hyperlink (or lack thereof) to the new value. It is perfectly legal to switch from one hyperlink to another without explicitly closing the first one. It is also perfectly legal to close a hyperlink when it's not actually open (e.g. to make sure to clean up after a potentially unclean exit of an application).

You can practically think of the hyperlink as yet another attribute that character cells have, similarly to the foreground and background color, bold, italic, strikethrough etc. bits. It is absolutely valid to switch from one color to another without resetting to the default in between, or to reset to the default multiple times. The same goes for hyperlinks.

## `file://` URIs and the hostname

Web browsers, desktop environments etc. tend to ignore the hostname component of a `file://hostname/path/to/file.txt` URI. In terminal emulators, such ignorance would lead to faulty targets if you `ssh` to a remote computer. As such, we don't allow this sloppiness. Utilities that print hyperlinks are requested to fill out the `hostname`, and terminal emulators are requested to match it against the local hostname and refuse to open the file if the hostname doesn't match (or offer other possibilities, e.g. to download with `scp` as iTerm2 does).

[RFC 8089](https://tools.ietf.org/html/rfc8089) says the `hostname` component should contain the fully qualified hostname, whereas [Freedesktop's File URI Specification](https://www.freedesktop.org/wiki/Specifications/file-uri-spec/) says it should contain the value returned by `gethostname()` which is often not fully qualified. It's unreasonable for simple utilities to go into the business of hostname resolution. As such, we urge utilities to place the value from `gethostname()` there (shell scripts might go for `$HOSTNAME`).

Terminal emulators should match the given value against the local hostname. They might accept multiple values, e.g. both short and fully qualified names, but they are free to go with just the value from `gethostname()`. They also must accept the string `localhost` or the empty string as local ones. If a different hostname is present, they must not open the local counterpart with the same filename.

## Hover underlining and the `id` parameter

Many terminal emulators automatically recognize web addresses that appear on their screen and underline them on mouse hover. In order to provide a similar user experience and discoverability of this new feature, we figured out we should do the same here as well.

There's a nontrivial question though: Which cells to underline on hover? As opposed to webpages, we lack the semantics, the information about the cells that belong together and form a single web anchor.

Remember the `http://exa` and `le.com` use case example above? To make the hyperlink feature complete, our imaginary text editor should be able to specify that these two pieces of text actually belong to the same anchor, and when mousing over any of these two, both should be underlined.

In order to be able to do this, the lowercase `id` parameter was introduced which connects the cells together.

Character cells that have the same target URI and the same nonempty `id` are always underlined together on mouseover.

The same `id` is only used for connecting character cells whose URIs is also the same. Character cells pointing to different URIs should never be underlined together when hovering over.

For hyperlink cells that do not have an `id` (or have an empty `id`, these two are interchangeable), the terminal emulator does some heuristics in figuring out which cells belong together. Here VTE and iTerm2 differ, but from a practical point of view, this difference should not matter. (VTE automatically assigns a new unique `id` whenever it encounters an OSC 8 with a URI but without `id`. That is, it automatically connects cells that were printed in a single OSC 8 run, in case there was no explicit `id`. iTerm2 looks at the onscreen contents and connects those cells that are next to each other, lack the `id`, but point to the same URI.)

Terminal emulators that implement hyperlinks but don't want to underline on mouseover can simply ignore the `id` parameter.

So, what should applications do? Here's a rough guideline, obviously applications are allowed to diverge if that's what gives the right result.

Simple utilities that "just" print stuff on their standard output should not assign an `id`. Things will just work then as expected.

Complex apps that manage the full screen and wish to explicitly linkify URIs, such as viewers or editors, should assign explicit `id`s that identify that particular link, so that it keeps being underlined together even across a linebreak, across another pane or window of the app's UI, and even across crazily optimized screen updates (e.g. when it repaints only a part of an anchor text). Such an `id` might perhaps be the file offset, or the (row, column) tuple where the hyperlink starts. Apps that support multiple windows, such as the imaginary text editor with that screenshot above, should add the ID of the window to the link's `id` too so that it does not conflict with the same target URI appearing in another window.

Complex apps that display data that might itself contain OSC 8 hyperlinks (such as terminal multiplexers, `less -R`) should do the following: If the encountered OSC 8 hyperlink already has an `id`, they should prefix it with some static string, or if multiple windows/panes are supported by the app, a prefix that's unique to that window/pane to prevent conflict with other windows/panes. If the encountered OSC 8 hyperlink does not have an `id`, they should automatically create one so that they can still have multiple windows/panes and can still crazily partially update the screen and keep it as a semantically single hyperlink towards the host emulator (remember the difference in VTE and iTerm2 when no `id` is set which becomes relevant here, so it should be avoided). This `id` should be taken from a namespace that cannot conflict with a mangled explicit `id`. It's probably much easier to implement VTE's approach here: assign a new `id` (maybe a sequential integer) whenever an OSC 8 with an URI but no `id` is encountered. This way there's absolutely no need to maintain any internal pool of the active hyperlink `id`s or anything like that, it's just a trivial mapping each time an OSC 8 is encountered in the data that needs to be displayed.

Fullscreen apps that do not switch to the "alternate screen" of the terminal emulator, that is, leave their contents onscreen when quitting, should probably add some other identifier, such as the process ID or some random number to the `id`, so that as the user scrolls back in the history with the scrollbar, remains of two previous sessions do not collide.

## Detecting availability of the feature

Currently there's no way of detecting whether the terminal emulator supports hyperlinks. We're hoping to address this at some point in the future.

The hyperlink feature should be used for providing convenient quick access to a target URI, but (at least by default) should not be the only means of figuring out the target.

## Backward compatibility

Any terminal that correctly implements OSC parsing according to ECMA-48 is guaranteed not to suffer from compatibility issues. That is, even if explicit hyperlinks aren't supported, the target URI is silently ignored and the supposed-to-be-visible text is displayed, without artifacts.

If a terminal emits garbage upon an OSC 8 explicit hyperlink sequence, that terminal is buggy according to ECMA-48. It is, and will always be, outside of the scope of this specification to deal with buggy terminals.

At this moment, terminals known to be buggy (OSC 8 resulting in display corruption) are VTE versions up to 0.46.2 and 0.48.1, Windows Terminal up to 0.9, Emacs's built-in terminal, and [screen](https://savannah.gnu.org/bugs/index.php?57718) with 700+ character long URLs.

## Length limits

Terminal emulators traditionally use maybe a dozen or so bytes per cell. Adding hyperlinks potentially increases it by magnitudes. As such, it's tricky to implement this feature in terminal emulators (without consuming way too much memory), and they probably want to expose some safety limits.

Both VTE and iTerm2 limit the URI to 2083 bytes. There's no de jure limit, the de facto is 2000-ish. Internet Explorer supports 2083.

VTE currently limits the `id` to 250 bytes. It's subject to change without notice, and you should most definitely _not_ rely on this particular number. Utilities are kindly requested to stay way below this limit, so that a few layers of intermediate software that need to mangle the `id` (e.g. add a prefix denoting their window/pane ID) still stay safe. Of course such intermediate layers are also kindly requested to keep their added prefix at a reasonable size. There's no limit for the `id`'s length in iTerm2.

Terminal emulators might also impose a maximum length on the overall length of the OSC 8 escape sequence, including all its parameters.

VTE is planned to intentionally slow down a tiny little bit if too many long links are written to its scrollback buffer. This is so that a malicious app cannot quickly eat up the space where it stores the scrollback contents. This should not have an effect on normal usage. FIXME this is planned but not yet implemented.

## Encodings

For portability, the parameters and the URI must not contain any bytes outside of the 32?126 range. If they do, the behavior is undefined. Bytes outside of this range in the URI must be URI-encoded.

Due to the syntax, additional parameter values cannot contain the `:` and `;` characters either. If required at a future extension, some escaping (such as URI-encoding or base64) should be chosen. (Should there ever be an extension to specify hover colors or attributes, I recommend considering going for the ANSI color and attribute notation with the upper dots removed, e.g. bold italic light gray `1;3;38:5:255` would become `1,3,38.5.255`.)

## Security

This feature doesn't introduce anything that's not already present while browsing the web. Therefore we believe this feature doesn't have security aspects to worry about.

In particular, if a webpage is exploitable by making someone visit a URL, passing along their cookies (e.g. doesn't have proper CSRF protection), it's already exploitable from a malicious website.

Moreover, there's no "Referer" leakage to worry about.

That being said, a few points have been raised that are worth noting here.

Some locally installed applications might register a handle for some custom URI scheme (e.g. `foobar://`), and the handler application might be vulnerable in case the rest of the URI is maliciously crafted. Terminal emulators might decide to whitelist only some well known schemes and ask for the user's confirmation on less known ones.

Some are worried that this feature is unexpected from users, and that introducing this somewhat automated link between the terminal and the browser works against the concept of "defense in depth". That is, it's possible that a multi-step attack, exploiting a vulnerability of a website, takes place by using social engineering to get someone follow such a link that they somehow receive in the terminal emulator. It's out of the scope of this specification to deal with such scenarios, this specification can only be responsible for direct security vulnerabilities that it might open. However, terminal emulators might consider adding the following lines of defense. They shouldn't open the link on a simple mouse click (that's for copy-pasting or reporting mouse events typically, anyway), only on some more complex user action such as Ctrl+click or via the right-click menu. They should let the user know the URI upfront. They could decide to present a confirmation dialog before opening it. They could even offer to disable this feature (or even have it disabled by default). People working in critical environments (or their sysadmins) could decide to disable this feature entirely.

## Links

- [GNOME Terminal discussion](https://bugzilla.gnome.org/show_bug.cgi?id=779734)
- [iTerm2 discussion](https://gitlab.com/gnachman/iterm2/issues/5158)
- [Test file](https://git.gnome.org/browse/vte/plain/perf/hyperlink-demo.txt)

From kent.b.karlsson at bahnhof.se  Sat Jan  6 05:05:14 2024
From: kent.b.karlsson at bahnhof.se (Kent Karlsson)
Date: Sat, 6 Jan 2024 12:05:14 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update 
In-Reply-To: <20240105220158.gYaCFpUL@steffen%sdaoden.eu>
References: <20240105220158.gYaCFpUL@steffen%sdaoden.eu>
Message-ID: <472640BF-538B-4531-9D58-3FC8D0BCD97B@bahnhof.se>

Note that hyperlinks are out of scope for my proposal. 

But ?ordinary? text editors are in scope, not just terminal emulators.

/Kent K

> 
> 6 jan. 2024 kl. 00:09 skrev Steffen Nurpmeso via Unicode <unicode at corp.unicode.org>:
> 
> ?Kent Karlsson via Unicode wrote in
> <A3C7F897-7CF9-49A9-ACC9-17B29069EDDC at bahnhof.se>:
> |I've done an update to the ECMA-48 styling: proposed update.
> |https://github.com/kent-karlsson/control/blob/main/ecma-48-style-moderni\
> |sation-2024.pdf
> |
> |The major updates are:
> 
> Last i looked (i downloaded it on July 21st last year) the new
> OSC-8 that is now in even wider use than last year (the new GNU
> groff ships with native support for generating IDs, and the Linux
> manual maintainer was eager to use that feature) was not
> incorporated.  I'll attach it, maybe it is of interest.
> 
> --steffen
> |
> |Der Kragenbaer,                The moon bear,
> |der holt sich munter           he cheerfully and one by one
> |einen nach dem anderen runter  wa.ks himself off
> |(By Robert Gernhardt)


From wjgo_10009 at btinternet.com  Sat Jan  6 07:46:36 2024
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Sat, 6 Jan 2024 13:46:36 +0000 (GMT)
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <472640BF-538B-4531-9D58-3FC8D0BCD97B@bahnhof.se>
References: <20240105220158.gYaCFpUL@steffen%sdaoden.eu>
 <472640BF-538B-4531-9D58-3FC8D0BCD97B@bahnhof.se>
Message-ID: <65085071.636f.18cdf0686ca.Webtop.95@btinternet.com>


It is often difficult to convey the tone of a post in an email, so I 
begin by saying that this is not in any way critical, that I know very 
little about this topic yet I am trying to learn, that I would be 
grateful if you regard these comments and questions as if an informal 
chat over cups of whatever in a common room somewhere someplace and of 
me trying to be helpful if I can.

What exactly are you trying to achieve please? For example, as well as 
keeping readers of this mailing list informed, for which I thank you, 
are you trying to persuade a specific committee somewhere to change a 
specific existing standard?

Sometime somewhere I was advised and I have added my own thoughts that 
the way to improve one's chances of getting something - whatever it is - 
done is to write a letter on no more than one side of A4 specifically 
starting with a request to do something specific or consider doing 
something specific, on the basis that a one side of A4 document has more 
chance than a longer letter of being read than being put on the side 
"for when I am not so busy" which in practice may never arrive, and to 
make it clear what you are wanting done, so that if the recipient of the 
letter is minded to be as helpful as possible to you then it is actually 
clear as to what you want done. I appreciate that with the letter there 
needs to be the detailed document and I also appreciate that this 
mailing list may not be to where you would send such a letter.

I started to have a look through your document and I noticed that you 
mention teletext. I was involved with teletext, mostly in the 1970s, yet 
I am still interested so could you say what you are suggesting please? 
In particular, are you suggesting a way to store in a file suitable for 
use in a Unicode context the teletext colour codes for both teletext 
alphanumerics and teletext graphics?

I am an end user of software programs and not a developer and my 
experience of programming is mostly in advising undergraduates on 
electrical and electronic engineering courses and on an information 
systems engineering course who were learning to write scientific 
programs, and I do not have detailed knowledge of the underlying systems 
software. As a result I am somewhat wary of having control codes other 
than the basic few used for carriage return and line feed as trying to 
use them in say, WordPad, can be problematic.

So I am wondering if it could be helpful to have a format as well where 
each of the control codes in what you are doing could be replaced on a 
round-trip-is-possible basis with plane 14 tag characters so as to 
produce a file format that could be suitable for a Unicode environment. 
I appreciate that is possible that this suggestion might possibly be 
unsuitable for some reason, but I mention it in case the suggestion 
might perhaps be useful.

Perhaps in a Unicode text system a good solution would be for 
Unicode/ISO IEC 10646 to have some (not yet encoded) non-printing codes 
added in plane 14 that are treated as not control codes in most uses yet 
can be treated as control codes in specific situations. This would mean 
that a file containing them would not contain Unicode control codes so 
could be stored and shared as a text file, yet when applied to specific 
equipment of specific software packages could be treated as if 
containing control codes.

William Overington

Saturday 6 January 2024


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240106/a88aa84d/attachment-0001.htm>

From steffen at sdaoden.eu  Sat Jan  6 14:00:12 2024
From: steffen at sdaoden.eu (Steffen Nurpmeso)
Date: Sat, 06 Jan 2024 21:00:12 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <472640BF-538B-4531-9D58-3FC8D0BCD97B@bahnhof.se>
References: <20240105220158.gYaCFpUL@steffen%sdaoden.eu>
 <472640BF-538B-4531-9D58-3FC8D0BCD97B@bahnhof.se>
Message-ID: <20240106200012.coOeos3J@steffen%sdaoden.eu>

Kent Karlsson via Unicode wrote in
 <472640BF-538B-4531-9D58-3FC8D0BCD97B at bahnhof.se>:
 |> 6 jan. 2024 kl. 00:09 skrev Steffen Nurpmeso via Unicode <unicode at corp.u\
 |> nicode.org>:
 |> 
 |> Kent Karlsson via Unicode wrote in
 |> <A3C7F897-7CF9-49A9-ACC9-17B29069EDDC at bahnhof.se>:
 |>|I've done an update to the ECMA-48 styling: proposed update.
 |>|https://github.com/kent-karlsson/control/blob/main/ecma-48-style-moderni\
 |>|sation-2024.pdf
 |>|
 |>|The major updates are:
 |> 
 |> Last i looked (i downloaded it on July 21st last year) the new
 |> OSC-8 that is now in even wider use than last year (the new GNU
 |> groff ships with native support for generating IDs, and the Linux
 |> manual maintainer was eager to use that feature) was not
 |> incorporated.  I'll attach it, maybe it is of interest.

 |Note that hyperlinks are out of scope for my proposal. 
 |
 |But ?ordinary? text editors are in scope, not just terminal emulators.

Oh then i misunderstood.  I thought you wanted to update what was
ECMA-48 over thirty years ago to include what happened ever since.
(Anticipating that a "regular" update will not happen, to cite you
more or less correctly.)

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


From kent.b.karlsson at bahnhof.se  Sat Jan  6 16:32:19 2024
From: kent.b.karlsson at bahnhof.se (Kent Karlsson)
Date: Sat, 6 Jan 2024 23:32:19 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update 
In-Reply-To: <20240106200012.coOeos3J@steffen%sdaoden.eu>
References: <20240106200012.coOeos3J@steffen%sdaoden.eu>
Message-ID: <F14870D1-0075-4A35-AF4A-130FED5F6577@bahnhof.se>


Skickat fr?n min iPhone

> 6 jan. 2024 kl. 21:02 skrev Steffen Nurpmeso via Unicode <unicode at corp.unicode.org>:
> 
> ?Kent Karlsson via Unicode wrote in
> <472640BF-538B-4531-9D58-3FC8D0BCD97B at bahnhof.se>:
> |> 6 jan. 2024 kl. 00:09 skrev Steffen Nurpmeso via Unicode <unicode at corp.u\
> |> nicode.org>:
> |>
> |> Kent Karlsson via Unicode wrote in
> |> <A3C7F897-7CF9-49A9-ACC9-17B29069EDDC at bahnhof.se>:
> |>|I've done an update to the ECMA-48 styling: proposed update.
> |>|https://github.com/kent-karlsson/control/blob/main/ecma-48-style-moderni\
> |>|sation-2024.pdf
> |>|
> |>|The major updates are:
> |>
> |> Last i looked (i downloaded it on July 21st last year) the new
> |> OSC-8 that is now in even wider use than last year (the new GNU
> |> groff ships with native support for generating IDs, and the Linux
> |> manual maintainer was eager to use that feature) was not
> |> incorporated.  I'll attach it, maybe it is of interest.
> 
> |Note that hyperlinks are out of scope for my proposal.
> |
> |But ?ordinary? text editors are in scope, not just terminal emulators.
> 
> Oh then i misunderstood.  I thought you wanted to update what was
> ECMA-48 over thirty years ago to include what happened ever since.
> (Anticipating that a "regular" update will not happen, to cite you
> more or less correctly.)

True, I do not expect any ?regular? update.

I do not attempt to cover ?everything?. That was never a goal. The ?styling? proposal covers, well, (text) styling. Usable also for styled text documents, not limited to terminal emulators.

I have an early draft regarding keyboard input stuff (and ECMA-48). And that is for terminal emulators only? When I get time, I will work more on that. (Still not covering ?everything?.)

/Kent K

Ps
The math expression representation proposal is not related to ECMA-48.

> --steffen
> |
> |Der Kragenbaer,                The moon bear,
> |der holt sich munter           he cheerfully and one by one
> |einen nach dem anderen runter  wa.ks himself off
> |(By Robert Gernhardt)
> 


From kent.b.karlsson at bahnhof.se  Sat Jan  6 18:46:47 2024
From: kent.b.karlsson at bahnhof.se (Kent Karlsson)
Date: Sun, 7 Jan 2024 01:46:47 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update 
In-Reply-To: <65085071.636f.18cdf0686ca.Webtop.95@btinternet.com>
References: <65085071.636f.18cdf0686ca.Webtop.95@btinternet.com>
Message-ID: <672875F0-B0FE-4557-A8B0-C2C292F8F338@bahnhof.se>


> 6 jan. 2024 kl. 14:49 skrev William_J_G Overington via Unicode <unicode at corp.unicode.org>:
> 
> ?
> 
> It is often difficult to convey the tone of a post in an email, so I begin by saying that this is not in any way critical, that I know very little about this topic yet I am trying to learn, that I would be grateful if you regard these comments and questions as if an informal chat over cups of whatever in a common room somewhere someplace and of me trying to be helpful if I can.
> What exactly are you trying to achieve please? For example, as well as keeping readers of this mailing list informed, for which I thank you, are you trying to persuade a specific committee somewhere to change a specific existing standard?

Well, the ECMA-48 committee is surely disbanded, and trying to resurrect would likely be futile. So my proposals will freestanding proposals ?hanging in the air? (or, rather, in github). A bit unfortunate perhaps, but that?s how it is. But you are welcome to pick suggestions from them anyway? I?ve tried to follow several of the styling updates already present in implementations (while not covering ?everything?, as being out of scope for the proposal).

However, it would be great if Unicode at least had better character properties for C0/C1 characters, rather than the completely wrong properties Unicode now has for them.

> Sometime somewhere I was advised and I have added my own thoughts that the way to improve one's chances of getting something - whatever it is - done is to write a letter on no more than one side of A4 specifically starting with a request to do something specific or consider doing something specific, on the basis that a one side of A4 document has more chance than a longer letter of being read than being put on the side "for when I am not so busy" which in practice may never arrive, and to make it clear what you are wanting done, so that if the recipient of the letter is minded to be as helpful as possible to you then it is actually clear as to what you want done. I appreciate that with the letter there needs to be the detailed document and I also appreciate that this mailing list may not be to where you would send such a letter.
> 
> I started to have a look through your document and I noticed that you mention teletext. I was involved with teletext, mostly in the 1970s, yet I am still interested so could you say what you are suggesting please? In particular, are you suggesting a way to store in a file suitable for use in a Unicode context the teletext colour codes for both teletext alphanumerics and teletext graphics?

As I mention, Teletext is still in use, and there is a standard for it, implemented in every tv set. I do not know how tv companies store the text, but likely using some proprietary representation, which is then converted to ?raw? Teletext. The example I give is mostly to show the styling available in Teletext is covered.

> I am an end user of software programs and not a developer and my experience of programming is mostly in advising undergraduates on electrical and electronic engineering courses and on an information systems engineering course who were learning to write scientific programs, and I do not have detailed knowledge of the underlying systems software. As a result I am somewhat wary of having control codes other than the basic few used for carriage return and line feed as trying to use them in say, WordPad, can be problematic.
> 
> So I am wondering if it could be helpful to have a format as well where each of the control codes in what you are doing could be replaced on a round-trip-is-possible basis with plane 14 tag characters so as to produce a file format that could be suitable for a Unicode environment. I appreciate that is possible that this suggestion might possibly be unsuitable for some reason, but I mention it in case the suggestion might perhaps be useful.
> 
> Perhaps in a Unicode text system a good solution would be for Unicode/ISO IEC 10646 to have some (not yet encoded) non-printing codes added in plane 14 that are treated as not control codes in most uses yet can be treated as control codes in specific situations. This would mean that a file containing them would not contain Unicode control codes so could be stored and shared as a text file, yet when applied to specific equipment of specific software packages could be treated as if containing control codes.

I?d strongly suggest that ?tag characters? be strongly deprecated. But that is a different topic. As is deprecating the property ?default-ignorable?. But considering any character code as ?non-printing? when not interpreted, is a bad idea, and that is sort of covered. (In Linux it is common to display uninterpreted characters as a ?hex box?, and that is fine. Not exactly SUB/REPLACEMENT CHARACTER, but still fine.)

And the C0/C1 characters should not be regarded magically different from other Unicode characters. Some of them have even been duplicated as non-C0/C1 characters. Something which I don?t think was all that good. Just look at LS and PS, which are basically unused. Everyone are still using LF/CRLF. And NBH even got duplicated twice.

/Kent K

> William Overington
> 
> Saturday 6 January 2024
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240107/2cf89fdc/attachment.htm>

From alexander.lange at catrinity-font.de  Sun Jan  7 04:37:53 2024
From: alexander.lange at catrinity-font.de (Alexander Lange)
Date: Sun, 7 Jan 2024 11:37:53 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <65085071.636f.18cdf0686ca.Webtop.95@btinternet.com>
References: <20240105220158.gYaCFpUL@steffen%sdaoden.eu>
 <472640BF-538B-4531-9D58-3FC8D0BCD97B@bahnhof.se>
 <65085071.636f.18cdf0686ca.Webtop.95@btinternet.com>
Message-ID: <50207b9c-e5a2-4280-bc12-7adebe32674c@catrinity-font.de>


On 06.01.2024 14:46, William_J_G Overington via Unicode wrote:
> Perhaps in a Unicode text system a good solution would be for 
> Unicode/ISO IEC 10646 to have some (not yet encoded) non-printing 
> codes added in plane 14 that are treated as not control codes in most 
> uses yet can be treated as control codes in specific situations. This 
> would mean that a file containing them would not contain Unicode 
> control codes so could be stored and shared as a text file, yet when 
> applied to specific equipment of specific software packages could be 
> treated as if containing control codes.
>
>
> William Overington
>
>
> Saturday 6 January 2024
>

This is pretty much the description of a communication protocol, or a 
declarative language like HTML. But usually it is done using existing 
printable characters from Basic Latin, so they can be viewed and edited 
easily. HTML for example uses tags like this: <p>My paragraph with text</p>

It shows up as it is written in a plain text editor, but the browsers 
recognize the tags and show it as an actual paragraph, making <, > and 
the letters between them behave exactly like the new characters you propose.


I honestly see no benefit in having new characters for this purpose, 
only the disadvantage that the plain text would be harder to edit (and 
unreadable if they are actually non-printing, defeating the whole 
purpose of a plain text format).


Kind regards,
Alexander Lange
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240107/d0841464/attachment.htm>

From wjgo_10009 at btinternet.com  Sun Jan  7 05:35:43 2024
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Sun, 7 Jan 2024 11:35:43 +0000 (GMT)
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <50207b9c-e5a2-4280-bc12-7adebe32674c@catrinity-font.de>
References: <20240105220158.gYaCFpUL@steffen%sdaoden.eu>
 <472640BF-538B-4531-9D58-3FC8D0BCD97B@bahnhof.se>
 <65085071.636f.18cdf0686ca.Webtop.95@btinternet.com>
 <50207b9c-e5a2-4280-bc12-7adebe32674c@catrinity-font.de>
Message-ID: <759381b2.688d.18ce3b5123b.Webtop.95@btinternet.com>


Hi

Alexander Lange wrote:


> ...  the disadvantage that the plain text would be harder to edit (and 
> unreadable if they are actually non-printing, ...

Editing could be achieved by using an editing font where the characters 
would have a displaying glyph. I have used such a technique by making a 
font where some of the tag characters (those for tag digits) had a 
visible glyph.

FontCreator 8 used in futuristic experiment - Font Forum 
(high-logic.com) <https://forum.high-logic.com/viewtopic.php?t=7941>

William


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240107/0832261d/attachment.htm>

From alexander.lange at catrinity-font.de  Sun Jan  7 05:58:51 2024
From: alexander.lange at catrinity-font.de (Alexander Lange)
Date: Sun, 7 Jan 2024 12:58:51 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <759381b2.688d.18ce3b5123b.Webtop.95@btinternet.com>
References: <20240105220158.gYaCFpUL@steffen%sdaoden.eu>
 <472640BF-538B-4531-9D58-3FC8D0BCD97B@bahnhof.se>
 <65085071.636f.18cdf0686ca.Webtop.95@btinternet.com>
 <50207b9c-e5a2-4280-bc12-7adebe32674c@catrinity-font.de>
 <759381b2.688d.18ce3b5123b.Webtop.95@btinternet.com>
Message-ID: <835bcee3-aec0-4fad-964a-3c825281ef11@catrinity-font.de>

Hi,

This only solves the problem of readability. It would still be harder to 
edit because you need a specialized keyboard layout and/or assistive 
technology for entering the new characters. Also remember that a Basic 
Latin letter needs 1 Byte in UTF-8 while a Plain 14 character needs 4, 
so for communication purposes you'll want to add some compression 
algorithm that both sides then need to implement.

And all of these things are just the workarounds needed to treat the 
disadvantages over Basic Latin based protocols. The main question is 
still: What is the benefit? What would be better?

Basic Latin based syntax has been working like a charm for decades now. 
I don't see any programmer switch from there to a more complicated 
system unless there is some serious advantage that I currently can't see.

Kind regards,
Alexander Lange


From wjgo_10009 at btinternet.com  Sun Jan  7 06:02:53 2024
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Sun, 7 Jan 2024 12:02:53 +0000 (GMT)
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <672875F0-B0FE-4557-A8B0-C2C292F8F338@bahnhof.se>
References: <65085071.636f.18cdf0686ca.Webtop.95@btinternet.com>
 <672875F0-B0FE-4557-A8B0-C2C292F8F338@bahnhof.se>
Message-ID: <57750939.68c9.18ce3cdf158.Webtop.95@btinternet.com>


I suggest that a way to popularize your suggestions is for there to be 
short science fiction stories where a problem arises and then it is 
solved using your ideas.

For example, here is some text to show what I mean, it is not a complete 
story, it needs two more sections written. If you, or someone else, 
writes those sections, using information in the documents that you have 
produced, then posts the whole story, then maybe there will be progress.

"Hello Patricia, how are you?"

"I'm fine, thanks, Anne, but I am having a bit of a problem over how to 
store in a Unicode file some text that is from a teletext page."

"Oh, what is the issue?"

"Well, the text is

It was printed by Gutenberg in Mainz.

and most of it is in green yet the name Gutenberg is in yellow. How can 
I code that?"

"Ah, I saw a document where a gentleman has some suggestions about 
extending the Ecma-48 system, it includes a part about teletext. We can 
try that if you like."

"Yes, please."

MORE NEEDED about Anne showing Patricia how to code the text in green 
and yellow.

"Thank you, Anne, that is a great solution."

"The other teletext colours can be coded too, here's how."

MORE NEEDED about how to code text in other colours.

"Great, I'll file that for future reference."


Now I know that that is a non-conventional way of conveying technical 
detail, but it works.

If that story gets completed and posted in this mailing list it will 
become archived and be there permanently.

The method can be used to convey complicated information in a more 
easily understood way.

It is an opportunity.

I hope this helps,

Best regards,

William


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240107/b8a73fb4/attachment-0001.htm>

From alexander.lange at catrinity-font.de  Sun Jan  7 06:23:15 2024
From: alexander.lange at catrinity-font.de (Alexander Lange)
Date: Sun, 7 Jan 2024 13:23:15 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <57750939.68c9.18ce3cdf158.Webtop.95@btinternet.com>
References: <65085071.636f.18cdf0686ca.Webtop.95@btinternet.com>
 <672875F0-B0FE-4557-A8B0-C2C292F8F338@bahnhof.se>
 <57750939.68c9.18ce3cdf158.Webtop.95@btinternet.com>
Message-ID: <81ed6577-e27e-428e-bbb1-56ed066d0fda@catrinity-font.de>

On 07.01.2024 13:02, William_J_G Overington via Unicode wrote:
> [...]
>
>
> "Well, the text is
>
>
> It was printed by Gutenberg in Mainz.
>
>
> and most of it is in green yet the name Gutenberg is in yellow. How 
> can I code that?"
>
>
> [...]

Alright, let's solve this example problem:


<p>It was printed by <i>Gutenberg</i> in Mainz.</p>


That would be the HTML to structure the text as needed. Any kind of 
styles, included all possible RGBA colors, can then easily be applied 
using CSS. This is also possible inline in cases where it's important to 
have only one file:


<p style="color: green">It was printed by <i style="color: yellow; 
font-style:normal">Gutenberg</i> in Mainz.</p>

(I added font-style:normal because <i> is shown in Italic by default.)


Other possibilities would be BBCode, RichText, ... or any own format you 
can define.


Kind regards,
Alexander Lange
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240107/d9407674/attachment.htm>

From asmusf at ix.netcom.com  Sun Jan  7 15:55:50 2024
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Sun, 7 Jan 2024 13:55:50 -0800
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <57750939.68c9.18ce3cdf158.Webtop.95@btinternet.com>
References: <65085071.636f.18cdf0686ca.Webtop.95@btinternet.com>
 <672875F0-B0FE-4557-A8B0-C2C292F8F338@bahnhof.se>
 <57750939.68c9.18ce3cdf158.Webtop.95@btinternet.com>
Message-ID: <ce12dd9d-e6aa-4f3a-b3c2-7353e75bea81@ix.netcom.com>

On 1/7/2024 4:02 AM, William_J_G Ringtone via Unicode wrote:
> I suggest that a way to popularize your suggestions is for there to be 
> short science fiction stories where a problem arises and then it is 
> solved using your ideas.

I think that's an excellent suggestion. All specifications should be 
accompanied by a science fiction story or perhaps a thriller that 
features the new technology.

I'm thinking a coded message affecting the future of humanity, where the 
secret is buried in the appearance, not the content of the text. And our 
amateur sleuths from the future need to revive an ancient machine found 
in an abandoned spaceship where they have to piece together what they 
can from preserved fragments of this mail discussion. While fighting off 
the CSS overlords.

That should about do it.

Once that thriller hits the bestseller lists, the inevitable fan 
community will implement the technology and keep it alive through 
sequels, film adaptations and fan fiction. See Klingon.

Case closed.

A./
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240107/0fa7652d/attachment.htm>

From wjgo_10009 at btinternet.com  Sun Jan  7 17:14:39 2024
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Sun, 7 Jan 2024 23:14:39 +0000 (GMT)
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <ce12dd9d-e6aa-4f3a-b3c2-7353e75bea81@ix.netcom.com>
References: <65085071.636f.18cdf0686ca.Webtop.95@btinternet.com>
 <672875F0-B0FE-4557-A8B0-C2C292F8F338@bahnhof.se>
 <57750939.68c9.18ce3cdf158.Webtop.95@btinternet.com>
 <ce12dd9d-e6aa-4f3a-b3c2-7353e75bea81@ix.netcom.com>
Message-ID: <60933022.6f08.18ce634f399.Webtop.95@btinternet.com>


> I think that's an excellent suggestion. All specifications should be 
> accompanied by a science fiction story or perhaps a thriller that 
> features the new technology.

Well, the method that I suggested does work for explaining technical 
matters clearly, often with examples of how to apply what is in the 
specification. Such a story conveys the ambience that occurs in a 
research environment in real life - chats in the staff room, discussions 
in formal meetings, travel to conferences, the thought processes of 
someone designing a specification, the social life around the research 
establishment and so on.

Readers who would like to read a completed novel about an invention 
involving information technology and the partly complete sequel and the 
formal scientific documents, all authored by me, are welcome to do so at 
my website, no registration requested nor required. There is also a 
slide show and a font available.

http://www.users.globalnet.co.uk/~ngo/

The website is safe to use, it is not on a computer owned by me, it is 
hosted on a server run by PlusNet PLC in the United Kingdom. I add 
content to the website by ftp over the internet.

William


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240107/07958897/attachment.htm>

From ecm.unicode at gmail.com  Sun Jan  7 19:33:28 2024
From: ecm.unicode at gmail.com (Erik Carvalhal Miller)
Date: Sun, 7 Jan 2024 20:33:28 -0500
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <ce12dd9d-e6aa-4f3a-b3c2-7353e75bea81@ix.netcom.com>
References: <65085071.636f.18cdf0686ca.Webtop.95@btinternet.com>
 <672875F0-B0FE-4557-A8B0-C2C292F8F338@bahnhof.se>
 <57750939.68c9.18ce3cdf158.Webtop.95@btinternet.com>
 <ce12dd9d-e6aa-4f3a-b3c2-7353e75bea81@ix.netcom.com>
Message-ID: <CAJTfRPFQ0yAib1vuE29AbWoBOiR=6tRpkL5EM3VF5L57o5_TbQ@mail.gmail.com>

On Sun, Jan 7, 2024 at 4:59?PM Asmus Freytag via Unicode <
unicode at corp.unicode.org> wrote:
>
> I'm thinking a coded message affecting the future of humanity, where the
secret is buried in the appearance, not the content of the text. And our
amateur sleuths from the future need to revive an ancient machine found in
an abandoned spaceship where they have to piece together what they can from
preserved fragments of this mail discussion. While fighting off the CSS
overlords.

Ah, yes?our sleuths must penetrate a monastic order dedicated to extreme
methods of denying the worldly pleasures that would distract them from
their stewardship of an arcane semantic system from our Sun?s nearest
stellar neighbor, in a gambit which could usher in a new Golden Age for
humanity or else devastate the Earth ? hence the title *Eunuch Code: The
Alpha Bet*.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240107/ac68d010/attachment.htm>

From asmusf at ix.netcom.com  Sun Jan  7 20:21:41 2024
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Sun, 7 Jan 2024 18:21:41 -0800
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <CAJTfRPFQ0yAib1vuE29AbWoBOiR=6tRpkL5EM3VF5L57o5_TbQ@mail.gmail.com>
References: <65085071.636f.18cdf0686ca.Webtop.95@btinternet.com>
 <672875F0-B0FE-4557-A8B0-C2C292F8F338@bahnhof.se>
 <57750939.68c9.18ce3cdf158.Webtop.95@btinternet.com>
 <ce12dd9d-e6aa-4f3a-b3c2-7353e75bea81@ix.netcom.com>
 <CAJTfRPFQ0yAib1vuE29AbWoBOiR=6tRpkL5EM3VF5L57o5_TbQ@mail.gmail.com>
Message-ID: <1317f37b-196a-4600-b19e-8ee45378f9e2@ix.netcom.com>

On 1/7/2024 5:33 PM, Erik Carvalhal Miller wrote:
> On Sun, Jan 7, 2024 at 4:59?PM Asmus Freytag via Unicode 
> <unicode at corp.unicode.org> wrote:
> >
> > I'm thinking a coded message affecting the future of humanity, where 
> the secret is buried in the appearance, not the content of the text. 
> And our amateur sleuths from the future need to revive an ancient 
> machine found in an abandoned spaceship where they have to piece 
> together what they can from preserved fragments of this mail 
> discussion. While fighting off the CSS overlords.
>
> Ah, yes?our sleuths must penetrate a monastic order dedicated to 
> extreme methods of denying the worldly pleasures that would distract 
> them from their stewardship of an arcane semantic system from our 
> Sun?s nearest stellar neighbor, in a gambit which could usher in a new 
> Golden Age for humanity or else devastate the Earth ? hence the title 
> /Eunuch Code: The Alpha Bet/.

Looks like we are getting there. Great working title.

I like how it suggests an all-or-nothing wager as a sub-theme.

I think we need to look at strong secondary characters next. I can 
picture a survivor; a gnarly early generation AI that tenaciously clings 
to the interstices of the derelict's hardware to become an unsuspected 
ally, though prone to hallucinations. Perhaps doubles as comic relief.

A./
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240107/ff88812f/attachment.htm>

From marius.spix at web.de  Mon Jan  8 04:14:45 2024
From: marius.spix at web.de (Marius Spix)
Date: Mon, 8 Jan 2024 11:14:45 +0100
Subject: Aw: Re: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <81ed6577-e27e-428e-bbb1-56ed066d0fda@catrinity-font.de>
References: <65085071.636f.18cdf0686ca.Webtop.95@btinternet.com>
 <672875F0-B0FE-4557-A8B0-C2C292F8F338@bahnhof.se>
 <57750939.68c9.18ce3cdf158.Webtop.95@btinternet.com>
 <81ed6577-e27e-428e-bbb1-56ed066d0fda@catrinity-font.de>
Message-ID: <trinity-e119fdd8-ac17-413d-b9a8-a7a12bfc53ed-1704708885278@3c-app-webde-bs25>

An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240108/5d9890f3/attachment.htm>

From kent.b.karlsson at bahnhof.se  Mon Jan  8 05:22:13 2024
From: kent.b.karlsson at bahnhof.se (Kent Karlsson)
Date: Mon, 8 Jan 2024 12:22:13 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update 
In-Reply-To: <50207b9c-e5a2-4280-bc12-7adebe32674c@catrinity-font.de>
References: <50207b9c-e5a2-4280-bc12-7adebe32674c@catrinity-font.de>
Message-ID: <97A8E492-3D50-4E66-B793-CF4A1BB12694@bahnhof.se>


Note that ECMA-48 is in no way new.

And it is still in use. It is used by terminal emulators, where HTML, RTF, troff, and other such formats are non-starters.

But this way of styling text is not limited to terminal emulators, it can be used also for text editors, even using wysiwyg editing. The gap between ?plain text?, which many people use a lot, full fledged document editors (like e.g. MS Word) is too large.

(B.t.w. most HTML these days is generated from some other, likely proprietary, system specific, representation. Few manually edit raw HTML. But that is a different topic.)

(And? There are XML-based document representations that, iiuc, do not use CSS.)

/Kent K

> 7 jan. 2024 kl. 11:40 skrev Alexander Lange via Unicode <unicode at corp.unicode.org>:
> 
> ?
> 
> On 06.01.2024 14:46, William_J_G Overington via Unicode wrote:
>> Perhaps in a Unicode text system a good solution would be for Unicode/ISO IEC 10646 to have some (not yet encoded) non-printing codes added in plane 14 that are treated as not control codes in most uses yet can be treated as control codes in specific situations. This would mean that a file containing them would not contain Unicode control codes so could be stored and shared as a text file, yet when applied to specific equipment of specific software packages could be treated as if containing control codes.
>> 
>> William Overington
>> 
>> Saturday 6 January 2024
> 
> This is pretty much the description of a communication protocol, or a declarative language like HTML. But usually it is done using existing printable characters from Basic Latin, so they can be viewed and edited easily. HTML for example uses tags like this: <p>My paragraph with text</p>
> It shows up as it is written in a plain text editor, but the browsers recognize the tags and show it as an actual paragraph, making <, > and the letters between them behave exactly like the new characters you propose.
> 
> I honestly see no benefit in having new characters for this purpose, only the disadvantage that the plain text would be harder to edit (and unreadable if they are actually non-printing, defeating the whole purpose of a plain text format).
> 
> Kind regards,
> Alexander Lange
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240108/127b5bcf/attachment-0001.htm>

From kent.b.karlsson at bahnhof.se  Mon Jan  8 05:45:11 2024
From: kent.b.karlsson at bahnhof.se (Kent Karlsson)
Date: Mon, 8 Jan 2024 12:45:11 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update 
In-Reply-To: <trinity-e119fdd8-ac17-413d-b9a8-a7a12bfc53ed-1704708885278@3c-app-webde-bs25>
References: <trinity-e119fdd8-ac17-413d-b9a8-a7a12bfc53ed-1704708885278@3c-app-webde-bs25>
Message-ID: <A3D8BC1D-AB3C-46CE-8E46-1CF11B8501A3@bahnhof.se>


It should be noted that HTML and ECMA-48 styling do not mix since HTML forbids most ?Cc? characters (except a few).

Some of the other points I have responded to in other emails.

/Kent K

> 8 jan. 2024 kl. 11:17 skrev Marius Spix via Unicode <unicode at corp.unicode.org>:
> 
> ?
>  
> Markup languages like HTML add an additional semantic layer to plaintext. If you want to emphasize a word, e. g. the name Gutenberg, you can use tags like <em> or <strong>. This is portable and barrier-free. For example, text-to-speech software for blind users will read emphasized words with another accentuation or a user with a color vision deficiency may use a custom CSS, with a higher contrast. It is also easy to provide the same text in different styles, for example when a website buys an article from a news agency, they don't need to reformat it. This would be a mess with hard-coded styling at plaintext layer. I absolutely see no reason for styling on the plaintext layer.
>  
> Gesendet: Sonntag, 07. Januar 2024 um 13:23 Uhr
> Von: "Alexander Lange via Unicode" <unicode at corp.unicode.org>
> An: unicode at corp.unicode.org
> Betreff: Re: Ecma-48 proposed styling controls update updated & math expression representation proposal update
> On 07.01.2024 13:02, William_J_G Overington via Unicode wrote:
> [...]
>  
> 
> "Well, the text is
> 
>  
> 
> It was printed by Gutenberg in Mainz.
> 
>  
> 
> and most of it is in green yet the name Gutenberg is in yellow. How can I code that?"
> 
>  
> 
> [...]
>  
> Alright, let's solve this example problem:
> 
>  
> 
> <p>It was printed by <i>Gutenberg</i> in Mainz.</p>
> 
>  
> 
> That would be the HTML to structure the text as needed. Any kind of styles, included all possible RGBA colors, can then easily be applied using CSS. This is also possible inline in cases where it's important to have only one file:
> 
>  
> 
> <p style="color: green">It was printed by <i style="color: yellow; font-style:normal">Gutenberg</i> in Mainz.</p>
> 
> (I added font-style:normal because <i> is shown in Italic by default.)
> 
>  
> 
> Other possibilities would be BBCode, RichText, ... or any own format you can define.
> 
>  
> 
> Kind regards,
> Alexander Lange
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240108/21787714/attachment.htm>

From kent.b.karlsson at bahnhof.se  Mon Jan  8 05:49:17 2024
From: kent.b.karlsson at bahnhof.se (Kent Karlsson)
Date: Mon, 8 Jan 2024 12:49:17 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update 
In-Reply-To: <835bcee3-aec0-4fad-964a-3c825281ef11@catrinity-font.de>
References: <835bcee3-aec0-4fad-964a-3c825281ef11@catrinity-font.de>
Message-ID: <E9B2DAF8-2B4D-4CDE-AD49-921FDF801C4B@bahnhof.se>

There is no suggestion of use of plane 14 characters in the proposal.

/Kent K

> 7 jan. 2024 kl. 13:00 skrev Alexander Lange via Unicode <unicode at corp.unicode.org>:
> 
> ?Hi,
> 
> This only solves the problem of readability. It would still be harder to edit because you need a specialized keyboard layout and/or assistive technology for entering the new characters. Also remember that a Basic Latin letter needs 1 Byte in UTF-8 while a Plain 14 character needs 4, so for communication purposes you'll want to add some compression algorithm that both sides then need to implement.
> 
> And all of these things are just the workarounds needed to treat the disadvantages over Basic Latin based protocols. The main question is still: What is the benefit? What would be better?
> 
> Basic Latin based syntax has been working like a charm for decades now. I don't see any programmer switch from there to a more complicated system unless there is some serious advantage that I currently can't see.
> 
> Kind regards,
> Alexander Lange
> 


From alexander.lange at catrinity-font.de  Mon Jan  8 06:19:30 2024
From: alexander.lange at catrinity-font.de (Alexander Lange)
Date: Mon, 8 Jan 2024 13:19:30 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <97A8E492-3D50-4E66-B793-CF4A1BB12694@bahnhof.se>
References: <50207b9c-e5a2-4280-bc12-7adebe32674c@catrinity-font.de>
 <97A8E492-3D50-4E66-B793-CF4A1BB12694@bahnhof.se>
Message-ID: <e4cb6c5b-c6f1-400f-93a6-f65df3e3c03a@catrinity-font.de>

Hi,

A little clarification maybe: I used HTML as an example for encoding 
meta information like document structure and formatting using Basic 
Latin characters, and without needing special control characters. I did 
not mean that HTML is the best solution in every possible context. 
However, in William's story, Patricia didn't indicate any reason why it 
wouldn't be a feasible solution for her.

Also, and more importantly, it was a direct reply to William's idea of 
possible future formats (for which I currently can't really see the use 
case), not to your original proposal. That's also where the plane 14 
characters come from.

 From your earlier e-mails, I especially agree on:

> However, it would be great if Unicode at least had better character 
> properties for C0/C1 characters, rather than the completely wrong 
> properties Unicode now has for them.

and

> And the C0/C1 characters should not be regarded magically different 
> from other Unicode characters.

Unfortunately, "magically different" is pretty exactly how it feels 
currently.

Kind regards,
Alexander
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240108/7b059d39/attachment.htm>

From wjgo_10009 at btinternet.com  Mon Jan  8 06:19:41 2024
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Mon, 8 Jan 2024 12:19:41 +0000 (GMT)
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <57750939.68c9.18ce3cdf158.Webtop.95@btinternet.com>
References: <65085071.636f.18cdf0686ca.Webtop.95@btinternet.com>
 <672875F0-B0FE-4557-A8B0-C2C292F8F338@bahnhof.se>
 <57750939.68c9.18ce3cdf158.Webtop.95@btinternet.com>
Message-ID: <12ed6771.7403.18ce903ad71.Webtop.95@btinternet.com>


Previously I wrote:

> MORE NEEDED about Anne showing Patricia how to code the text in green 
> and yellow.

I managed to download the PDF document to local storage and I have found 
on page 55 of the PDF document a table with codes in Kent's proposed 
enhanced system for setting the colours to alphanumerics green and 
alphanumerics yellow so that writing of the story can make progress.

It appears that for alphanumerics green SP? CSI 92m is needed and that 
for alphanumerics yellow that SP? CSI 93m is needed. I expect that the 
code alphanumerics green will be needed at the start of the line and 
after the name Gutenberg and that alphanumerics yellow will be needed 
before the name Gutenberg. A teletext alphanumerics colour code 
automatically generates a space in the display, so I am wondering if, 
for the possibility of making round-trip conversion back to teletext 
format that the space should be after the code rather than before it as 
is listed in Kent's document so that in a round trip the space following 
the colour code can be omitted when it is reached rather than needing to 
go back and remove it when the colour code is detected.

I do not currently know what CSI means in this context. There are 808 
mentions of CSI is the PDF document and the first one is on page 6 but 
at present I do not understand it.

I am thinking that if this story can be completed and includes a 
reference to Kent's document and Anne and Patricia have a discussion 
about why Anne thinks it better to have the space after the colour code 
rather than before then the story might well be a good learning 
resource.

William

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240108/a6e7e657/attachment.htm>

From atif.gulzar at gmail.com  Mon Jan  8 06:36:55 2024
From: atif.gulzar at gmail.com (Atif Gulzar)
Date: Mon, 8 Jan 2024 17:36:55 +0500
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <12ed6771.7403.18ce903ad71.Webtop.95@btinternet.com>
References: <65085071.636f.18cdf0686ca.Webtop.95@btinternet.com>
 <672875F0-B0FE-4557-A8B0-C2C292F8F338@bahnhof.se>
 <57750939.68c9.18ce3cdf158.Webtop.95@btinternet.com>
 <12ed6771.7403.18ce903ad71.Webtop.95@btinternet.com>
Message-ID: <CACdA_tGnefPrs6LUQ7rVadiqz8yXfTed0xeshVabARbacD0aaA@mail.gmail.com>

How to unsubscribe from this group/forum.


--
Best Regards,
Atif Gulzar

I ???? Unicode, ??zln? ????


On Mon, Jan 8, 2024 at 5:23?PM William_J_G Overington via Unicode <
unicode at corp.unicode.org> wrote:

> Previously I wrote:
>
>
> > MORE NEEDED about Anne showing Patricia how to code the text in green
> and yellow.
>
>
> I managed to download the PDF document to local storage and I have found
> on page 55 of the PDF document a table with codes in Kent's proposed
> enhanced system for setting the colours to alphanumerics green and
> alphanumerics yellow so that writing of the story can make progress.
>
>
> It appears that for alphanumerics green SP? CSI 92m is needed and that for
> alphanumerics yellow that SP? CSI 93m is needed. I expect that the code
> alphanumerics green will be needed at the start of the line and after the
> name Gutenberg and that alphanumerics yellow will be needed before the name
> Gutenberg. A teletext alphanumerics colour code automatically generates a
> space in the display, so I am wondering if, for the possibility of making
> round-trip conversion back to teletext format that the space should be
> after the code rather than before it as is listed in Kent's document so
> that in a round trip the space following the colour code can be omitted
> when it is reached rather than needing to go back and remove it when the
> colour code is detected.
>
>
> I do not currently know what CSI means in this context. There are 808
> mentions of CSI is the PDF document and the first one is on page 6 but at
> present I do not understand it.
>
>
> I am thinking that if this story can be completed and includes a reference
> to Kent's document and Anne and Patricia have a discussion about why Anne
> thinks it better to have the space after the colour code rather than before
> then the story might well be a good learning resource.
>
> William
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240108/0a165497/attachment.htm>

From wjgo_10009 at btinternet.com  Mon Jan  8 06:55:52 2024
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Mon, 8 Jan 2024 12:55:52 +0000 (GMT)
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
Message-ID: <7897ae55.74bf.18ce924cbd0.Webtop.95@btinternet.com>


Previously I wrote:

> I do not currently know what CSI means in this context. There are 808 
> mentions of CSI is the PDF document and the first one is on page 6 but 
> at present I do not understand it.

I think that I do now.

I looked up U+005B in

https://www.unicode.org/charts/PDF/U0000.pdf

and U+009B in

https://www.unicode.org/charts/PDF/U0080.pdf

So it seems that Patricia can either use a [ character within the 
sequence or she can use the perhaps more mathematically elegant yet 
harder to insert U+009B control character.

So, if I have understood it correctly, for practicality with the 
keyboard that she is using Patricia can use *[92m for alphanumerics 
green and *[93m for alphanumerics yellow, where * here represents the 
Escape character, not an asterisk. Patricia will need to also key a 
space character, whether before or after the Escape sequence is what she 
can discuss with Anne.

So if I have got that correct, it looks like the technical information 
to complete this particular story is now detailed in this thread.

William


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240108/1c6aafb3/attachment.htm>

From cate at cateee.net  Mon Jan  8 06:58:54 2024
From: cate at cateee.net (Giacomo Catenazzi)
Date: Mon, 8 Jan 2024 13:58:54 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <12ed6771.7403.18ce903ad71.Webtop.95@btinternet.com>
References: <65085071.636f.18cdf0686ca.Webtop.95@btinternet.com>
 <672875F0-B0FE-4557-A8B0-C2C292F8F338@bahnhof.se>
 <57750939.68c9.18ce3cdf158.Webtop.95@btinternet.com>
 <12ed6771.7403.18ce903ad71.Webtop.95@btinternet.com>
Message-ID: <21021559-4e6b-42c2-a604-76904c741626@cateee.net>

CSI is defined in Unicode and in other ECMA standards: it is the 
terminal command usually send as `ESC [` (and if should be terminated by 
characters between 0x40 and 0x7E, but there were bugs and exceptions on 
some platforms), there is also one single character in C1 (so still two 
bytes in UTF-8), but many terminal disregard this alternate (which it is 
also very old).

But so we see the advantage of having elements (and tags) written as 
clear text (as in HTML, LaTeX, etc.): if we do not understand one 
element we can google it. With ECMA-48 code: either is standard, or good 
luck to find some references.

ECMA-48-like syntax is bad, difficult to enhance without requiring 
updates on all programs (contrary to HTML: tags can be just ignored, 
without consequences to next ones, or previous *non-closed* one). Note: 
this fact is caused by a different reason, which modern mark-up 
languages shares: they are *structured* (which ECMA-48 is not, not TeX, 
but TeX is frozen). Past gives us a lesson, let's learn about it, and 
not doing the same errors. ECMA-48 is the past (still used on some 
appliances, but without expectation to enhance it too much: we have 
alternate graphical interfaces).

[Note: but I think there is a need for an update of ECMA-48: to 
standardise common behaviour, but it should be done by the maintainers 
of the different terminals].

cate


On 8 Jan 2024 13:19, William_J_G Overington via Unicode wrote:
> Previously I wrote:
> 
> 
>  > MORE NEEDED about Anne showing Patricia how to code the text in green 
> and yellow.
> 
> 
> I managed to download the PDF document to local storage and I have found 
> on page 55 of the PDF document a table with codes in Kent's proposed 
> enhanced system for setting the colours to alphanumerics green and 
> alphanumerics yellow so that writing of the story can make progress.
> 
> 
> It appears that for alphanumerics green SP? CSI 92m is needed and that 
> for alphanumerics yellow that SP? CSI 93m is needed. I expect that the 
> code alphanumerics green will be needed at the start of the line and 
> after the name Gutenberg and that alphanumerics yellow will be needed 
> before the name Gutenberg. A teletext alphanumerics colour code 
> automatically generates a space in the display, so I am wondering if, 
> for the possibility of making round-trip conversion back to teletext 
> format that the space should be after the code rather than before it as 
> is listed in Kent's document so that in a round trip the space following 
> the colour code can be omitted when it is reached rather than needing to 
> go back and remove it when the colour code is detected.
> 
> 
> I do not currently know what CSI means in this context. There are 808 
> mentions of CSI is the PDF document and the first one is on page 6 but 
> at present I do not understand it.
> 
> 
> I am thinking that if this story can be completed and includes a 
> reference to Kent's document and Anne and Patricia have a discussion 
> about why Anne thinks it better to have the space after the colour code 
> rather than before then the story might well be a good learning resource.
> 
> William
> 
> 

From kent.b.karlsson at bahnhof.se  Mon Jan  8 15:45:02 2024
From: kent.b.karlsson at bahnhof.se (Kent Karlsson)
Date: Mon, 8 Jan 2024 22:45:02 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update 
In-Reply-To: <12ed6771.7403.18ce903ad71.Webtop.95@btinternet.com>
References: <12ed6771.7403.18ce903ad71.Webtop.95@btinternet.com>
Message-ID: <C2406B8F-B752-4ECE-AC02-D1C7EC764D77@bahnhof.se>


Skickat fr?n min iPhone

> 8 jan. 2024 kl. 13:21 skrev William_J_G Overington via Unicode <unicode at corp.unicode.org>:
> 
> It appears that for alphanumerics green SP? CSI 92m is needed and that for alphanumerics yellow that SP? CSI 93m is needed. I expect that the code alphanumerics green will be needed at the start of the line and after the name Gutenberg and that alphanumerics yellow will be needed before the name Gutenberg. A teletext alphanumerics colour code automatically generates a space in the display, so I am wondering if, for the possibility of making round-trip conversion back to teletext format that the space should be after the code rather than before it as is listed in Kent's document so that in a round trip the space following the colour code can be omitted when it is reached rather than needing to go back and remove it when the colour code is detected.

Teletext (characters and protocol) are quite messy. In this particular case it has to do with ?apply-after? and ?apply-at?, as well as suppression of space, which I have tried to convey in short-hand.

For more details, see the Teletext standard. In the references section there is a link to that standard?s document.

I?m not giving a full conversion. As I mentioned, Teletext is quite messy. And my proposal document is not the place to deep-dive into Teletext stangeness.

/Kent K


From kent.b.karlsson at bahnhof.se  Mon Jan  8 16:36:11 2024
From: kent.b.karlsson at bahnhof.se (Kent Karlsson)
Date: Mon, 8 Jan 2024 23:36:11 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update 
In-Reply-To: <21021559-4e6b-42c2-a604-76904c741626@cateee.net>
References: <21021559-4e6b-42c2-a604-76904c741626@cateee.net>
Message-ID: <FA5B39D3-F1CC-4CE7-9B8A-3AA94893C107@bahnhof.se>


Skickat fr?n min iPhone

> 8 jan. 2024 kl. 14:00 skrev Giacomo Catenazzi via Unicode <unicode at corp.unicode.org>:
> 
> ?CSI is defined in Unicode and in other ECMA standards: it is the terminal command

Nothing restricts it to terminals or terminal emulators.

> usually send as `ESC [`

In modern terms, that is a character reference.

> (and if should be terminated by characters between 0x40 and 0x7E, but there were bugs and exceptions on some platforms), there is also one single character in C1 (so still two bytes in UTF-8), but many terminal disregard this alternate (which it is also very old).

Because in a terminal (emulator) the character encoding may change without notice. 

> But so we see the advantage of having elements (and tags) written as clear text (as in HTML, LaTeX, etc.): if we do not understand one element we can google it. With ECMA-48 code: either is standard, or good luck to find some references.

But HTML etc. are, and will continue to be, complete non-starters for terminal emulators.

However, ECMA-48 styling can be used to style enhance what is otherwise a plain text document, without getting entangled in a second level interpretation of tags, typesettig commands, similar, expressed in what would be ?plain text?.

> ECMA-48-like syntax is bad, difficult to enhance without requiring updates on all programs (contrary to HTML: tags can be
> just ignored, without consequences to next ones,

I don?t see your point. I.e., I don?t see that there is any difference in principle.

> or previous *non-closed* one). Note: this fact is caused by a different reason, which modern mark-up languages shares: they are *structured* (which ECMA-48 is not,

Well, for the most part not, but tables (and Ruby) actually is structured also in ECMA-48. As are bidi controls. As are hyperlinks (OSC 8), proposed by others. And so are math expressions (in all three variants, separate proposal, but one variant is compatible with ECMA-48, another with HTML).

And most (major exception: CSI 0m, which should only be used for terminal emulators) styling controls have a start-(change)*-end structure, but unrelated styling controls need not nest. You may consider that last bit a flaw, but one that cannot be fixed for compatibility reasons.

/Kent K

> not TeX, but TeX is frozen). Past gives us a lesson, let's learn about it, and not doing the same errors. ECMA-48 is the past (still used on some appliances, but without expectation to enhance it too much: we have alternate graphical interfaces).
> 
> [Note: but I think there is a need for an update of ECMA-48: to standardise common behaviour, but it should be done by the maintainers of the different terminals].
> 
> cate
> 
> 
> 
>> On 8 Jan 2024 13:19, William_J_G Overington via Unicode wrote:
>> Previously I wrote:
>> > MORE NEEDED about Anne showing Patricia how to code the text in green and yellow.
>> I managed to download the PDF document to local storage and I have found on page 55 of the PDF document a table with codes in Kent's proposed enhanced system for setting the colours to alphanumerics green and alphanumerics yellow so that writing of the story can make progress.
>> It appears that for alphanumerics green SP? CSI 92m is needed and that for alphanumerics yellow that SP? CSI 93m is needed. I expect that the code alphanumerics green will be needed at the start of the line and after the name Gutenberg and that alphanumerics yellow will be needed before the name Gutenberg. A teletext alphanumerics colour code automatically generates a space in the display, so I am wondering if, for the possibility of making round-trip conversion back to teletext format that the space should be after the code rather than before it as is listed in Kent's document so that in a round trip the space following the colour code can be omitted when it is reached rather than needing to go back and remove it when the colour code is detected.
>> I do not currently know what CSI means in this context. There are 808 mentions of CSI is the PDF document and the first one is on page 6 but at present I do not understand it.
>> I am thinking that if this story can be completed and includes a reference to Kent's document and Anne and Patricia have a discussion about why Anne thinks it better to have the space after the colour code rather than before then the story might well be a good learning resource.
>> William


From cate at cateee.net  Tue Jan  9 06:58:58 2024
From: cate at cateee.net (Giacomo Catenazzi)
Date: Tue, 9 Jan 2024 13:58:58 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <FA5B39D3-F1CC-4CE7-9B8A-3AA94893C107@bahnhof.se>
References: <21021559-4e6b-42c2-a604-76904c741626@cateee.net>
 <FA5B39D3-F1CC-4CE7-9B8A-3AA94893C107@bahnhof.se>
Message-ID: <12696fed-22ba-4184-aa70-79e9e538cca8@cateee.net>

On 8 Jan 2024 23:36, Kent Karlsson wrote:
> 
> Skickat fr?n min iPhone
> 
>> 8 jan. 2024 kl. 14:00 skrev Giacomo Catenazzi via Unicode <unicode at corp.unicode.org>:

>> usually send as `ESC [`
> 
> In modern terms, that is a character reference.
> 
>> (and if should be terminated by characters between 0x40 and 0x7E, but there were bugs and exceptions on some platforms), there is also one single character in C1 (so still two bytes in UTF-8), but many terminal disregard this alternate (which it is also very old).
> 
> Because in a terminal (emulator) the character encoding may change without notice.

But the point of ECMA-48 (with ECMA-35, ECMA-43) is to know it, and 
being able to map the function CSI to (nearly any) C0 or C1 characters? 
In any case, if the terminal (emulator) is using UTF-8, it should be 
clear that such sequences are useable.

Or should we improve such part? We have the infrastructure and standards 
(since a very long time). If the client want UTF-8, it just send the 
relative prefix, so that we know C0, C1, GL, *GR* as expected for UTF-8, 
so no more weird text because emulators and programs doesn't agree on 
encoding.


>> But so we see the advantage of having elements (and tags) written as clear text (as in HTML, LaTeX, etc.): if we do not understand one element we can google it. With ECMA-48 code: either is standard, or good luck to find some references.
> 
> But HTML etc. are, and will continue to be, complete non-starters for terminal emulators.
> 
> However, ECMA-48 styling can be used to style enhance what is otherwise a plain text document, without getting entangled in a second level interpretation of tags, typesettig commands, similar, expressed in what would be ?plain text?.
> 

Yes, but do we really care so much about styling in terminal emulators?

In my experience, the best formatted text in emulators were (and are) 
the manual pages (bold, italic, good *dynamic* layout, etc.). But I (and 
it seems most of people) find much more readable to view them online 
(also on very bad html-formatted, which are unfortunately common).

(and I found Lynx and w3c bad).


But how do you input the formatting? you will use a sort of HTML or 
markdown, which will translate with new ECMA-48-style formatting. But so 
why not using such high level formatting for interchange? What is not 
one of the old point of the proposal? Or it is just a local styling fact 
(so more at a level of library (termcap/terminfo))?


>> ECMA-48-like syntax is bad, difficult to enhance without requiring updates on all programs (contrary to HTML: tags can be
>> just ignored, without consequences to next ones,
> 
> I don?t see your point. I.e., I don?t see that there is any difference in principle.


With CSI we have too many way to "reset" styling (or also to set the 
same property, e.g. red text, bur also to push/pop it, etc). So it makes 
difficult to enhance and it is "by design".  Do you remember 20 years 
ago? Terminal supported colours, but many programs behave weirdly, 
because programs didn't implement all CSI, and different programs had 
different expectation.

My expectation is that most of enhancements will bring us back on such 
time, until emulator maintainer converge on a single behaviour.

And standardising is not a solution. ECMA-48 (and ECMA-35) are 
standardised, but could you cite me an emulator which implement them? As 
we discussed at beginning, we have the CSI (and in general C1) problem, 
but there are many other points are not fully supported. BTW would it 
solve (partly) the "Teletext" problem?

But also programs have different expectations. Many programs doesn't 
expect proportional fonts: they expect monospace fonts (eventually with 
wide-characters).

We lack full support of ECMA-35, ECMA-43, and ECMA-48, so my expectation 
on defining new extensions is minimal.

> 
>> or previous *non-closed* one). Note: this fact is caused by a different reason, which modern mark-up languages shares: they are *structured* (which ECMA-48 is not,
> 
> Well, for the most part not, but tables (and Ruby) actually is structured also in ECMA-48. As are bidi controls. As are hyperlinks (OSC 8), proposed by others. And so are math expressions (in all three variants, separate proposal, but one variant is compatible with ECMA-48, another with HTML).

But the purpose? For interchange information (and to type), we will use 
a different format, so better to use such format which users can 
understand (HTML, Markdown, TeX, or a new "UniFormat"). To display? is 
someone willing to program it? Many standards have: "usage first, 
standardisation later".

> 
> And most (major exception: CSI 0m, which should only be used for terminal emulators) styling controls have a start-(change)*-end structure, but unrelated styling controls need not nest. You may consider that last bit a flaw, but one that cannot be fixed for compatibility reasons.

Yes. The "compability" is a curse. HTML was bolder: there were 
deprecation (and removal of features). Implementator could support old 
behaviour, but it was a strong push to convert to more "modern" 
constructs. ECMA-43 took the best part of ECMA-35 ("let's assume 8-bit, 
so forget how to handle 8-bit and multichars in a 7-bit system"), but we 
still keep the 7-bit compatibility "ESC [".

And ECMA was not always compatible with previous versions. If you want 
to do an improved ECMA-48, please break compatibility (and let termcap 
to handle them).


In addition: one of the problem was the lack of good documentation of 
what CSI code does (in a precise way, so also interaction with other CSI 
and states). Thomas Dickey (xterm) does an excellent job, on 
documenting, but it is strictly the interpretation of xterm. The other 
emulators have limited documentation (mostly just on extensions), but we 
lack of good centralised place. So: it is difficult to have 
"compatibility" when we lack of well defined current behaviour.

My hope of the proposal is about refreshing documentation and sumarizing 
expected behaviour (so the good part of ECMA-48, the de-facto standard 
enhancements, and maybe something new), but I would see little value on 
interchange of data. (and to display maths, we can draw "anything" on 
terminals).

	cate


> 
> /Kent K
> 
>> not TeX, but TeX is frozen). Past gives us a lesson, let's learn about it, and not doing the same errors. ECMA-48 is the past (still used on some appliances, but without expectation to enhance it too much: we have alternate graphical interfaces).
>>
>> [Note: but I think there is a need for an update of ECMA-48: to standardise common behaviour, but it should be done by the maintainers of the different terminals].
>>
>> cate
>>
>>
>>
>>> On 8 Jan 2024 13:19, William_J_G Overington via Unicode wrote:
>>> Previously I wrote:
>>>> MORE NEEDED about Anne showing Patricia how to code the text in green and yellow.
>>> I managed to download the PDF document to local storage and I have found on page 55 of the PDF document a table with codes in Kent's proposed enhanced system for setting the colours to alphanumerics green and alphanumerics yellow so that writing of the story can make progress.
>>> It appears that for alphanumerics green SP? CSI 92m is needed and that for alphanumerics yellow that SP? CSI 93m is needed. I expect that the code alphanumerics green will be needed at the start of the line and after the name Gutenberg and that alphanumerics yellow will be needed before the name Gutenberg. A teletext alphanumerics colour code automatically generates a space in the display, so I am wondering if, for the possibility of making round-trip conversion back to teletext format that the space should be after the code rather than before it as is listed in Kent's document so that in a round trip the space following the colour code can be omitted when it is reached rather than needing to go back and remove it when the colour code is detected.
>>> I do not currently know what CSI means in this context. There are 808 mentions of CSI is the PDF document and the first one is on page 6 but at present I do not understand it.
>>> I am thinking that if this story can be completed and includes a reference to Kent's document and Anne and Patricia have a discussion about why Anne thinks it better to have the space after the colour code rather than before then the story might well be a good learning resource.
>>> William

From wjgo_10009 at btinternet.com  Tue Jan  9 07:46:11 2024
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Tue, 9 Jan 2024 13:46:11 +0000 (GMT)
Subject: UniFormat (from Re: Ecma-48 proposed styling controls update
 updated etc ...)
In-Reply-To: <12696fed-22ba-4184-aa70-79e9e538cca8@cateee.net>
References: <21021559-4e6b-42c2-a604-76904c741626@cateee.net>
 <FA5B39D3-F1CC-4CE7-9B8A-3AA94893C107@bahnhof.se>
 <12696fed-22ba-4184-aa70-79e9e538cca8@cateee.net>
Message-ID: <417e798.833e.18cee79391f.Webtop.95@btinternet.com>


Giacomo Catenazzi wrote:

> But the purpose? For interchange information (and to type), we will 
> use a different format, so better to use such format which users can 
> understand (HTML, Markdown, TeX, or a new "UniFormat"). To display? is 
> someone willing to program it? Many standards have: "usage first, 
> standardisation later".

Well, I put forward a proposal to use Variation Selector 14 to specify a 
request for an italics glyph. I had made a test font and it worked well. 
The proposal was not stateful and could, in my opinion, be a useful 
facility. Yet the proposal got rejected.

Maybe if UniFormat becomes designed and implemented, that suggestion 
might be considered for inclusion..

A PDF document could be a convenient way to gather a VS14 character to 
paste into a document to get UniFormat going as a VS14 character is not, 
as far as I am aware, a key on an existing keyboard.

If UniFormat were to become implemented then a USB UniFormat external 
keyboard could be manufactured.

If UniFormat became into use by many end users then such keyboards might 
well be available at supermarkets and deliverable as part of the 
grocery, as USB external keyboards are now.

The big problem is getting started with UniFormat. There is a good 
chance that even discussion of it will be dismissed as not necessary 
today!

William


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240109/7eeeaca4/attachment.htm>

From kent.b.karlsson at bahnhof.se  Tue Jan  9 16:12:48 2024
From: kent.b.karlsson at bahnhof.se (Kent Karlsson)
Date: Tue, 9 Jan 2024 23:12:48 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update 
In-Reply-To: <12696fed-22ba-4184-aa70-79e9e538cca8@cateee.net>
References: <12696fed-22ba-4184-aa70-79e9e538cca8@cateee.net>
Message-ID: <A468DDE2-297C-4EF4-A3AE-8B8A505892CD@bahnhof.se>


>>> there is also one single character in C1 (so still two bytes in UTF-8), but many terminal disregard this alternate (which it is also very old).
>> Because in a terminal (emulator) the character encoding may change without notice.
> 
> [?] In any case, if the terminal (emulator) is using UTF-8, it should be clear that such sequences are useable.

I repeat, the reason why terminal emulators by default ignore what might be C1 characters is that the character encoding may change without notice. (Excluding EBCDIC.) You may get ?mojibake?, but not worse than that.

Xterm, at least, allows turning C1 interpretation on.

Note that out of the ECMA standards you mentioned, only ECMA-48 is still viable, and very much so. All the others are defunct, and should be ignored.

> Yes, but do we really care so much about styling in terminal emulators?

Not sure what ?we? you are referring to?
But some things are already implemented, like more  colours, more underlines and with separate colouring.

I do not attempt to change that ECMA-48 is a sm?rg?sbord of things to choose from. And I am not sure I can persuade terminal emulator developers to implement tables, but it would surely be nice to have proper tables and not just ?ASCII art? tables?

And, as I have mentioned, add some styling capability to otherwise ?plain text? editors. Not having to use a high end document formatting tool if you don?t really need that for some underlining, bold, bigger letters, or even tables. Not everything has to be super-high end for styling text.

> In my experience, the best formatted text in emulators were (and are) the manual pages (bold, italic, good *dynamic* layout, etc.). But I (and it seems most of people) find much more readable to view them online (also on very bad html-formatted, which are unfortunately common).

No, I would not recommend using a terminal emulator for viewing web pages.

> (and I found Lynx and w3c bad).
> 
> 
> But how do you input the formatting?

For output to a terminal emulator from a program, the source program would have string constants for control sequences or parts thereof, just like done now.

For a styling enhanced plain text editor one should be able to select a text portion, and then use a menu or keyboard shortcut to select a styling, as it is done in just about any modern text editor. There is no need for an end user to see the styling codes. Using something like HTML syntax would have terrible consequences in that it is hard to tell content from controls. For HTML for instance one MUST use &lt; for <, so that it is not taken as start of a ?tag?. That is absolutely nothing you want to see for a terminal, nor for a styling enhanced plain text editor.

/Kent K


From pgcon6 at msn.com  Wed Jan 10 10:47:59 2024
From: pgcon6 at msn.com (Peter Constable)
Date: Wed, 10 Jan 2024 16:47:59 +0000
Subject: UDHR in Unicode
In-Reply-To: <DS0PR12MB753509A716DB38E8E04868C78661A@DS0PR12MB7535.namprd12.prod.outlook.com>
References: <SJ0PR03MB65988A3534C3C0E29D468854CA90A@SJ0PR03MB6598.namprd03.prod.outlook.com>
 <DS0PR12MB753583694E17EB691BD47E038697A@DS0PR12MB7535.namprd12.prod.outlook.com>
 <8c9926c0-265a-4114-b930-de22ed21902b@code2001.com>
 <DS0PR12MB753509A716DB38E8E04868C78661A@DS0PR12MB7535.namprd12.prod.outlook.com>
Message-ID: <DS0PR12MB75353F230F768D646D85B27A86692@DS0PR12MB7535.namprd12.prod.outlook.com>

The udhr repo is now here:
https://github.com/eric-muller/udhr


Peter

-----Original Message-----
From: Unicode <unicode-bounces at corp.unicode.org> On Behalf Of Peter Constable via Unicode
Sent: Tuesday, January 2, 2024 11:46 AM
To: James Kass <jameskass at code2001.com>; unicode at corp.unicode.org
Subject: RE: UDHR in Unicode

Happy 2024!

Some in this thread were jumping to unwarranted conclusions. (Nothing will be deleted.) The UDHR project will be taken over by Eric Muller, who was the one that started it. The Web content and git repo will be moved to a domain he owns.


Peter Constable


From wjgo_10009 at btinternet.com  Wed Jan 10 13:24:47 2024
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Wed, 10 Jan 2024 19:24:47 +0000 (GMT)
Subject: Teletext colour codes and Unicode (from Re: Ecma-48 proposed
 styling controls update updated etc ...)
In-Reply-To: <A468DDE2-297C-4EF4-A3AE-8B8A505892CD@bahnhof.se>
References: <12696fed-22ba-4184-aa70-79e9e538cca8@cateee.net>
 <A468DDE2-297C-4EF4-A3AE-8B8A505892CD@bahnhof.se>
Message-ID: <6b6144aa.9634.18cf4d595e3.Webtop.95@btinternet.com>


A way forward with the method in Kent's document for teletext colour 
codes in a Unicode environment with a good possibility of the method 
being taken up and applied in practice would, in my opinion, for there 
to be a Unicode Technical Report detailing the method that Kent has 
proposed.

How can people who are interested in there being such a Unicode 
Technical Report achieve this goal please?

William Overington

Wednesday 10 January 2024

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240110/9fc73b0c/attachment-0001.htm>

From cate at cateee.net  Thu Jan 11 04:32:50 2024
From: cate at cateee.net (Giacomo Catenazzi)
Date: Thu, 11 Jan 2024 11:32:50 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <A468DDE2-297C-4EF4-A3AE-8B8A505892CD@bahnhof.se>
References: <12696fed-22ba-4184-aa70-79e9e538cca8@cateee.net>
 <A468DDE2-297C-4EF4-A3AE-8B8A505892CD@bahnhof.se>
Message-ID: <799bb267-606d-4f9c-9140-d1e25c862f51@cateee.net>

On 9 Jan 2024 23:12, Kent Karlsson wrote:
(...)

Let's skip a lot of *details*.

>>
>> But how do you input the formatting?
> 
> For output to a terminal emulator from a program, the source program would have string constants for control sequences or parts thereof, just like done now.
> 
> For a styling enhanced plain text editor one should be able to select a text portion, and then use a menu or keyboard shortcut to select a styling, as it is done in just about any modern text editor. There is no need for an end user to see the styling codes. Using something like HTML syntax would have terrible consequences in that it is hard to tell content from controls. For HTML for instance one MUST use &lt; for <, so that it is not taken as start of a ?tag?. That is absolutely nothing you want to see for a terminal, nor for a styling enhanced plain text editor.

I dislike this part, and I think it is the main problem.

Note: I'm actively fighting the use of "string constants" for CSI, in 
programs. Note: maybe we have a different interpretation.

For emulators I want that they uses libraries or at least they check 
terminal capabilities and they issues formatting codes (CSI, from 
ECMA-48 or common usage which are de-facto standards).

Every terminal emulator is different, and users want to use it also 
differently (so changing the settings). A programmer should not make a 
choice for me.

Do a program want to print on console? I'm ok that it may write some 
warnings in colours (but often they fails: they assume a background 
colour (and please: it is my choice!)), if I want to write to a log 
file, no CSI codes.

Hard coded formatting code are bad (and BTW html strongly discourage 
them, for reason: we learn from past).

And now I stop with the first rant.


HTML (and LaTeX) can format text according the medium, and HTML is 
responsive. I find no good way to do it with ECMA-48 style. We can ask 
the size of the screen, or get a signal when it changes, but there is 
not real support on emulators: rendering is performed by programs (e.g. 
using dialog, or directly with curses library). Could you find a good 
way to display in a sensible way tables with different terminal widths 
(starting from 40 or less columns?). It is not code we want in most (or 
any) terminal emulator.

But also in an editor...I feel that programmers must transform it in 
html/css, do the rendering with existing libraries (which they are 
huge), and render it as text + CSI.


What problem are you solving? Real case problem. The more I look the 
proposal, the more I think other tools are much easier and simpler.

Note: HTML with years solved many problems (also considering colour 
blind people, printing, etc.). Note: HTML as technology, not what we got 
from web (but so, possibly you should implement your proposal in that 
way: you just convert CSI to html (DOM), and lets' display it): so we 
have a real case to look. (and there are already libraries that do it, 
but without your extension proposal).

Your proposal is in any case doesn't maintain plain text: CSI sequences 
have punctuation, letters and numbers. So there is no much differences 
of text in elements and tags in HTML: a program/person which want the 
plain text, e.g. for copy/past, must do a lot of work removing 
formatting. In modern html is easy.


I find it would have been  nice idea if we were in 1990s (and so an 
alternative of HTML), but now we have good designs, so do not let's to 
duplicate the huge work HTML did in past). For a practical point (if I 
need to implement it): just a filter to a DOM engine (which at the end 
would be a subset of existing HTML engines) and a rendering (which trend 
go in direction of HTML like formatting API for different graphical 
environment).


And in any case, you should start at a higher layer: show programs, and 
if it is useful emulators and editors will adopt it. Or like tmux (so a 
sort of filter, and IIRC in past some *extensions*, e.g. the UTF-8 where 
done first as filter between user and terminal emulator.

	cate

From marius.spix at web.de  Thu Jan 11 06:21:51 2024
From: marius.spix at web.de (Marius Spix)
Date: Thu, 11 Jan 2024 13:21:51 +0100
Subject: Aw: Re: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
Message-ID: <trinity-a1983efa-4355-4c5e-9b47-29cfb692613d-1704975711774@3c-app-webde-bs41>

An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240111/e682a2d0/attachment.htm>

From marius.spix at web.de  Thu Jan 11 07:42:03 2024
From: marius.spix at web.de (Marius Spix)
Date: Thu, 11 Jan 2024 14:42:03 +0100
Subject: Aw: Re: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
Message-ID: <trinity-de7d6388-a2f4-46d7-a73a-0c99e1429184-1704980523412@3c-app-webde-bap27>

An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240111/3d424615/attachment.htm>

From kent.b.karlsson at bahnhof.se  Thu Jan 11 14:21:11 2024
From: kent.b.karlsson at bahnhof.se (Kent Karlsson)
Date: Thu, 11 Jan 2024 21:21:11 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update 
In-Reply-To: <trinity-de7d6388-a2f4-46d7-a73a-0c99e1429184-1704980523412@3c-app-webde-bap27>
References: <trinity-de7d6388-a2f4-46d7-a73a-0c99e1429184-1704980523412@3c-app-webde-bap27>
Message-ID: <6D761456-E42B-4B78-956A-84F9E68CCE4C@bahnhof.se>


> 11 jan. 2024 kl. 14:44 skrev Marius Spix via Unicode <unicode at corp.unicode.org>:
> 
> ?
> Here is an interesting article, how escape sequences can be used to hide malicious context in source code: https://www.infosecmatter.com/terminal-escape-injection/

Thanks for the reference. It is a bit ironic that an article about security is sprinkled with clickbait ads. At least it was for me. But it makes it impossible to include as a reference in any proposal document.

Yes, there are security concerns. I did include a security aspects section. But I did not mention presentation component editing. I do not plan to propose any changes or additions to presentation component editing controls. IIUC they are sufficient as they are. But I did include that uninterpreted control codes, control sequences, and control strings should be displayed (i.e. not be invisible), and that keyboard input control sequences as well as presentation component editing control sequences must be uninterpreted in a text editor. Regarding ?cat? etc., I think they unfortunately are unsalvageable.

> This would not happen with human-readable markup like HTML

The problem discussed is unrelated to control sequences vs. tags. But is related to the presence of presentation component (read: display) edit control sequences in ECMA-48, which are not at all covered by my proposal.

/Kent K

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240111/1c03c6a3/attachment.htm>

From kent.b.karlsson at bahnhof.se  Thu Jan 11 14:22:54 2024
From: kent.b.karlsson at bahnhof.se (Kent Karlsson)
Date: Thu, 11 Jan 2024 21:22:54 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update 
In-Reply-To: <799bb267-606d-4f9c-9140-d1e25c862f51@cateee.net>
References: <799bb267-606d-4f9c-9140-d1e25c862f51@cateee.net>
Message-ID: <2C96F3F4-E443-47B5-8A76-161350564635@bahnhof.se>


Well, ?curses? was a solution for last century?s problems.
1)     Covers just very limited functionality, no parameters.
2)     Approximately the same functionality for all.
3)     But different manufacturers, or different models, had different controls to invoke each function. ?Curses? did help in that context.
But with ECMA-48, the popularity of xterm (and derivatives), and the obsolescence of physical terminals by the use of terminal emulators, that solution is already obsolete. It is hard to see that it would have any future, since it is so limiting.

/Kent K

> 11 jan. 2024 kl. 11:35 skrev Giacomo Catenazzi <cate at cateee.net>:
> 
> ?On 9 Jan 2024 23:12, Kent Karlsson wrote:
> (...)
> 
> Let's skip a lot of *details*.
> 
>>> 
>>> But how do you input the formatting?
>> For output to a terminal emulator from a program, the source program would have string constants for control sequences or parts thereof, just like done now.
>> For a styling enhanced plain text editor one should be able to select a text portion, and then use a menu or keyboard shortcut to select a styling, as it is done in just about any modern text editor. There is no need for an end user to see the styling codes. Using something like HTML syntax would have terrible consequences in that it is hard to tell content from controls. For HTML for instance one MUST use &lt; for <, so that it is not taken as start of a ?tag?. That is absolutely nothing you want to see for a terminal, nor for a styling enhanced plain text editor.
> 
> I dislike this part, and I think it is the main problem.
> 
> Note: I'm actively fighting the use of "string constants" for CSI, in programs. Note: maybe we have a different interpretation.
> 
> For emulators I want that they uses libraries or at least they check terminal capabilities and they issues formatting codes (CSI, from ECMA-48 or common usage which are de-facto standards).
> 
> Every terminal emulator is different, and users want to use it also differently (so changing the settings). A programmer should not make a choice for me.
> 
> Do a program want to print on console? I'm ok that it may write some warnings in colours (but often they fails: they assume a background colour (and please: it is my choice!)), if I want to write to a log file, no CSI codes.
> 
> Hard coded formatting code are bad (and BTW html strongly discourage them, for reason: we learn from past).
> 
> And now I stop with the first rant.
> 
> 
> HTML (and LaTeX) can format text according the medium, and HTML is responsive. I find no good way to do it with ECMA-48 style. We can ask the size of the screen, or get a signal when it changes, but there is not real support on emulators: rendering is performed by programs (e.g. using dialog, or directly with curses library). Could you find a good way to display in a sensible way tables with different terminal widths (starting from 40 or less columns?). It is not code we want in most (or any) terminal emulator.
> 
> But also in an editor...I feel that programmers must transform it in html/css, do the rendering with existing libraries (which they are huge), and render it as text + CSI.
> 
> 
> What problem are you solving? Real case problem. The more I look the proposal, the more I think other tools are much easier and simpler.
> 
> Note: HTML with years solved many problems (also considering colour blind people, printing, etc.). Note: HTML as technology, not what we got from web (but so, possibly you should implement your proposal in that way: you just convert CSI to html (DOM), and lets' display it): so we have a real case to look. (and there are already libraries that do it, but without your extension proposal).
> 
> Your proposal is in any case doesn't maintain plain text: CSI sequences have punctuation, letters and numbers. So there is no much differences of text in elements and tags in HTML: a program/person which want the plain text, e.g. for copy/past, must do a lot of work removing formatting. In modern html is easy.
> 
> 
> I find it would have been  nice idea if we were in 1990s (and so an alternative of HTML), but now we have good designs, so do not let's to duplicate the huge work HTML did in past). For a practical point (if I need to implement it): just a filter to a DOM engine (which at the end would be a subset of existing HTML engines) and a rendering (which trend go in direction of HTML like formatting API for different graphical environment).
> 
> 
> And in any case, you should start at a higher layer: show programs, and if it is useful emulators and editors will adopt it. Or like tmux (so a sort of filter, and IIRC in past some *extensions*, e.g. the UTF-8 where done first as filter between user and terminal emulator.
> 
>    cate
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240111/40646cbb/attachment-0001.htm>

From kent.b.karlsson at bahnhof.se  Thu Jan 11 14:49:30 2024
From: kent.b.karlsson at bahnhof.se (Kent Karlsson)
Date: Thu, 11 Jan 2024 21:49:30 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update 
In-Reply-To: <trinity-a1983efa-4355-4c5e-9b47-29cfb692613d-1704975711774@3c-app-webde-bs41>
References: <trinity-a1983efa-4355-4c5e-9b47-29cfb692613d-1704975711774@3c-app-webde-bs41>
Message-ID: <88662E22-A179-4E27-9342-520A68064B0E@bahnhof.se>


> 11 jan. 2024 kl. 13:24 skrev Marius Spix via Unicode <unicode at corp.unicode.org>:
> 
> ?
> Question: How do you copy text preserving the styling?
> For example, you have the following text (in these examples I use ^ as escape character and visible characters instead of the proposed tagging characters.)
>  
> This is a ^[31mred Text^[0m ECMA-48 styling.
>  
> You now want to copy the word "text" and insert it to another document. The styling information gets lost.
> Then you copy the words "a ^[31mtext" and your whole document after these words becomes red until the text color is changed again. This is very confusing and unintuitive. ECMA-48 styling is stack-based and stateful, which makes it hard to select and copy text to another location.
>  
> Another question: How are you supposed to compare ECMA-48 styled texts? The strings
> "This is a ^[31mred^[0m Text" and "This is a ^[31mr^[0m^[31me^[0m^[31md^[0m" text look and behave exactly the same, but are technically different.

It is certainly a somewhat tricky issue. But it is solvable. Just at any editor that allows text styling and copy-paste. Regardless of representation, internal or external. (E.g. MS Word; though it still has some bugs. Sorry for mentioning a specific product.)

> This opens up a wide range of attack vectors, e.g. on source code, file names, URIs,

I suggest nothing new w.r.t. those.

> legal documents etc. For example, a user could create two different versions of identically looking documents, which result in the same hash to spoof digital signatures. It also allows watermarking texts by inserting a detectable pattern to prevent copyright violations.

I suggest nothing new in regard to those either. (B.t.w., such hashes are usually based on the very lowest level of (external) representation, i.e. the byte values, without any interpretation of that.)

/Kent K

> Regards,
>  
> Marius Spix
>  
>  
>  
> Gesendet: Donnerstag, 11. Januar 2024 um 11:32 Uhr
> Von: "Giacomo Catenazzi via Unicode" <unicode at corp.unicode.org>
> An: "Kent Karlsson" <kent.b.karlsson at bahnhof.se>
> Cc: unicode at corp.unicode.org
> Betreff: Re: Ecma-48 proposed styling controls update updated & math expression representation proposal update
> On 9 Jan 2024 23:12, Kent Karlsson wrote:
> (...)
> 
> Let's skip a lot of *details*.
> 
> >>
> >> But how do you input the formatting?
> >
> > For output to a terminal emulator from a program, the source program would have string constants for control sequences or parts thereof, just like done now.
> >
> > For a styling enhanced plain text editor one should be able to select a text portion, and then use a menu or keyboard shortcut to select a styling, as it is done in just about any modern text editor. There is no need for an end user to see the styling codes. Using something like HTML syntax would have terrible consequences in that it is hard to tell content from controls. For HTML for instance one MUST use &lt; for <, so that it is not taken as start of a ?tag?. That is absolutely nothing you want to see for a terminal, nor for a styling enhanced plain text editor.
> 
> I dislike this part, and I think it is the main problem.
> 
> Note: I'm actively fighting the use of "string constants" for CSI, in
> programs. Note: maybe we have a different interpretation.
> 
> For emulators I want that they uses libraries or at least they check
> terminal capabilities and they issues formatting codes (CSI, from
> ECMA-48 or common usage which are de-facto standards).
> 
> Every terminal emulator is different, and users want to use it also
> differently (so changing the settings). A programmer should not make a
> choice for me.
> 
> Do a program want to print on console? I'm ok that it may write some
> warnings in colours (but often they fails: they assume a background
> colour (and please: it is my choice!)), if I want to write to a log
> file, no CSI codes.
> 
> Hard coded formatting code are bad (and BTW html strongly discourage
> them, for reason: we learn from past).
> 
> And now I stop with the first rant.
> 
> 
> HTML (and LaTeX) can format text according the medium, and HTML is
> responsive. I find no good way to do it with ECMA-48 style. We can ask
> the size of the screen, or get a signal when it changes, but there is
> not real support on emulators: rendering is performed by programs (e.g.
> using dialog, or directly with curses library). Could you find a good
> way to display in a sensible way tables with different terminal widths
> (starting from 40 or less columns?). It is not code we want in most (or
> any) terminal emulator.
> 
> But also in an editor...I feel that programmers must transform it in
> html/css, do the rendering with existing libraries (which they are
> huge), and render it as text + CSI.
> 
> 
> What problem are you solving? Real case problem. The more I look the
> proposal, the more I think other tools are much easier and simpler.
> 
> Note: HTML with years solved many problems (also considering colour
> blind people, printing, etc.). Note: HTML as technology, not what we got
> from web (but so, possibly you should implement your proposal in that
> way: you just convert CSI to html (DOM), and lets' display it): so we
> have a real case to look. (and there are already libraries that do it,
> but without your extension proposal).
> 
> Your proposal is in any case doesn't maintain plain text: CSI sequences
> have punctuation, letters and numbers. So there is no much differences
> of text in elements and tags in HTML: a program/person which want the
> plain text, e.g. for copy/past, must do a lot of work removing
> formatting. In modern html is easy.
> 
> 
> I find it would have been nice idea if we were in 1990s (and so an
> alternative of HTML), but now we have good designs, so do not let's to
> duplicate the huge work HTML did in past). For a practical point (if I
> need to implement it): just a filter to a DOM engine (which at the end
> would be a subset of existing HTML engines) and a rendering (which trend
> go in direction of HTML like formatting API for different graphical
> environment).
> 
> 
> And in any case, you should start at a higher layer: show programs, and
> if it is useful emulators and editors will adopt it. Or like tmux (so a
> sort of filter, and IIRC in past some *extensions*, e.g. the UTF-8 where
> done first as filter between user and terminal emulator.
> 
> cate
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240111/64cb732f/attachment.htm>

From asmusf at ix.netcom.com  Thu Jan 11 22:24:03 2024
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Thu, 11 Jan 2024 20:24:03 -0800
Subject: UDHR in Unicode
In-Reply-To: <DS0PR12MB75353F230F768D646D85B27A86692@DS0PR12MB7535.namprd12.prod.outlook.com>
References: <SJ0PR03MB65988A3534C3C0E29D468854CA90A@SJ0PR03MB6598.namprd03.prod.outlook.com>
 <DS0PR12MB753583694E17EB691BD47E038697A@DS0PR12MB7535.namprd12.prod.outlook.com>
 <8c9926c0-265a-4114-b930-de22ed21902b@code2001.com>
 <DS0PR12MB753509A716DB38E8E04868C78661A@DS0PR12MB7535.namprd12.prod.outlook.com>
 <DS0PR12MB75353F230F768D646D85B27A86692@DS0PR12MB7535.namprd12.prod.outlook.com>
Message-ID: <59502d9a-d506-4dbe-bc27-05023e819a51@ix.netcom.com>

If somebody can find a good place for that info, we can put an FAQ item 
with the new location.

Let me have a reasonably specific proposal and I can roll it in.

A./

On 1/10/2024 8:47 AM, Peter Constable via Unicode wrote:
> The udhr repo is now here:
> https://github.com/eric-muller/udhr
>
>
> Peter
>
> -----Original Message-----
> From: Unicode<unicode-bounces at corp.unicode.org>  On Behalf Of Peter Constable via Unicode
> Sent: Tuesday, January 2, 2024 11:46 AM
> To: James Kass<jameskass at code2001.com>;unicode at corp.unicode.org
> Subject: RE: UDHR in Unicode
>
> Happy 2024!
>
> Some in this thread were jumping to unwarranted conclusions. (Nothing will be deleted.) The UDHR project will be taken over by Eric Muller, who was the one that started it. The Web content and git repo will be moved to a domain he owns.
>
>
> Peter Constable
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240111/f7a998c1/attachment.htm>

From doug at ewellic.org  Fri Jan 12 11:35:37 2024
From: doug at ewellic.org (Doug Ewell)
Date: Fri, 12 Jan 2024 17:35:37 +0000
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <trinity-a1983efa-4355-4c5e-9b47-29cfb692613d-1704975711774@3c-app-webde-bs41>
References: <trinity-a1983efa-4355-4c5e-9b47-29cfb692613d-1704975711774@3c-app-webde-bs41>
Message-ID: <SJ0PR03MB65983E04A791C9530417AAFDCA6F2@SJ0PR03MB6598.namprd03.prod.outlook.com>

Marius Spix wrote:

> Question: How do you copy text preserving the styling?
> For example, you have the following text (in these examples I use ^ as
> escape character and visible characters instead of the proposed
> tagging characters.)
>
> This is a ^[31mred Text^[0m ECMA-48 styling.
>
> You now want to copy the word "text" and insert it to another
> document. The styling information gets lost.
> Then you copy the words "a ^[31mtext" and your whole document after
> these words becomes red until the text color is changed again. This is
> very confusing and unintuitive.

How is this handled in Word, or in any other WYSIWYG editor?

What about in WordPerfect for DOS, where different foreground and background colors in text mode represented bold, italics, underlining, etc.?

Did we not have any way to manipulate styled text before HTML came along?

--
Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org


From eliz at gnu.org  Fri Jan 12 12:19:23 2024
From: eliz at gnu.org (Eli Zaretskii)
Date: Fri, 12 Jan 2024 20:19:23 +0200
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <SJ0PR03MB65983E04A791C9530417AAFDCA6F2@SJ0PR03MB6598.namprd03.prod.outlook.com>
 (message from Doug Ewell via Unicode on Fri, 12 Jan 2024 17:35:37
 +0000)
References: <trinity-a1983efa-4355-4c5e-9b47-29cfb692613d-1704975711774@3c-app-webde-bs41>
 <SJ0PR03MB65983E04A791C9530417AAFDCA6F2@SJ0PR03MB6598.namprd03.prod.outlook.com>
Message-ID: <838r4uifdg.fsf@gnu.org>

> CC: "unicode at corp.unicode.org" <unicode at corp.unicode.org>
> Date: Fri, 12 Jan 2024 17:35:37 +0000
> From: Doug Ewell via Unicode <unicode at corp.unicode.org>
> 
> Marius Spix wrote:
> 
> > Question: How do you copy text preserving the styling?
> > For example, you have the following text (in these examples I use ^ as
> > escape character and visible characters instead of the proposed
> > tagging characters.)
> >
> > This is a ^[31mred Text^[0m ECMA-48 styling.
> >
> > You now want to copy the word "text" and insert it to another
> > document. The styling information gets lost.
> > Then you copy the words "a ^[31mtext" and your whole document after
> > these words becomes red until the text color is changed again. This is
> > very confusing and unintuitive.
> 
> How is this handled in Word, or in any other WYSIWYG editor?

They use specialized formats of the clipboard data, where the styles
and typefaces are preserved.  See

  https://learn.microsoft.com/en-us/windows/win32/dataxchg/clipboard-formats

> What about in WordPerfect for DOS, where different foreground and background colors in text mode represented bold, italics, underlining, etc.?

You mean, copying from some part of WordPerfect document to another
part of the same document?  Because DOS supported only one program at
at ime, and didn't have a clipboard (or anything similar) at all.

> Did we not have any way to manipulate styled text before HTML came along?

Yes, of course.  RichText comes to mind.

From doug at ewellic.org  Fri Jan 12 14:03:37 2024
From: doug at ewellic.org (Doug Ewell)
Date: Fri, 12 Jan 2024 20:03:37 +0000
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <838r4uifdg.fsf@gnu.org>
References: <trinity-a1983efa-4355-4c5e-9b47-29cfb692613d-1704975711774@3c-app-webde-bs41>
 <SJ0PR03MB65983E04A791C9530417AAFDCA6F2@SJ0PR03MB6598.namprd03.prod.outlook.com>
 <838r4uifdg.fsf@gnu.org>
Message-ID: <SJ0PR03MB65988D31C226F04FC527D72ECA6F2@SJ0PR03MB6598.namprd03.prod.outlook.com>

Eli Zaretskii wrote:

>> How is this handled in Word, or in any other WYSIWYG editor?
>
> They use specialized formats of the clipboard data, where the styles
> and typefaces are preserved.  See
>
> https://learn.microsoft.com/en-us/windows/win32/dataxchg/clipboard-formats

I know the internal format is different. Marius wasn?t talking about that. He was talking about the user experience of copying and pasting styled text when the styling data is invisible, and appearing to claim this was an intractable problem.

Alternatively, why is the stated user-experience problem for ECMA-48 not a problem for Word?

>> What about in WordPerfect for DOS, where different foreground and
>> background colors in text mode represented bold, italics,
>> underlining, etc.?
>
> You mean, copying from some part of WordPerfect document to another
> part of the same document?  Because DOS supported only one program at
> at ime, and didn't have a clipboard (or anything similar) at all.

OK, good point. What about WordPerfect for Windows, or WordPad, or [pick your favorite tool from a non-Windows environment] where the formatting isn?t visible?

>> Did we not have any way to manipulate styled text before HTML came
>> along?
>
> Yes, of course.  RichText comes to mind.

And we were able to copy and paste styled text! Sure, sometimes the formatting gets screwed up, and still does. But it was not a hopeless and unsolved problem, fixed only when in-band formatting using ASCII characters came into vogue.

--
Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org


From eliz at gnu.org  Fri Jan 12 14:12:42 2024
From: eliz at gnu.org (Eli Zaretskii)
Date: Fri, 12 Jan 2024 22:12:42 +0200
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <SJ0PR03MB65988D31C226F04FC527D72ECA6F2@SJ0PR03MB6598.namprd03.prod.outlook.com>
 (message from Doug Ewell via Unicode on Fri, 12 Jan 2024 20:03:37
 +0000)
References: <trinity-a1983efa-4355-4c5e-9b47-29cfb692613d-1704975711774@3c-app-webde-bs41>
 <SJ0PR03MB65983E04A791C9530417AAFDCA6F2@SJ0PR03MB6598.namprd03.prod.outlook.com>
 <838r4uifdg.fsf@gnu.org>
 <SJ0PR03MB65988D31C226F04FC527D72ECA6F2@SJ0PR03MB6598.namprd03.prod.outlook.com>
Message-ID: <834jfiia4l.fsf@gnu.org>

> CC: "marius.spix at web.de" <marius.spix at web.de>,
>         "unicode at corp.unicode.org"
>  <unicode at corp.unicode.org>
> Date: Fri, 12 Jan 2024 20:03:37 +0000
> From: Doug Ewell via Unicode <unicode at corp.unicode.org>
> 
> Eli Zaretskii wrote:
> 
> >> How is this handled in Word, or in any other WYSIWYG editor?
> >
> > They use specialized formats of the clipboard data, where the styles
> > and typefaces are preserved.  See
> >
> > https://learn.microsoft.com/en-us/windows/win32/dataxchg/clipboard-formats
> 
> I know the internal format is different. Marius wasn?t talking about that. He was talking about the user experience of copying and pasting styled text when the styling data is invisible, and appearing to claim this was an intractable problem.

Sorry, I'm probably missing something, because I don't see the
relevance.  My point is that copy/paste through the clipboard uses
formats that are not plain text, and encode the styles and typefaces
by using methods that are not compatible with plain text.

> Alternatively, why is the stated user-experience problem for ECMA-48 not a problem for Word?

I thought I answered that?  Or what do you mean by "user experience"?

> >> What about in WordPerfect for DOS, where different foreground and
> >> background colors in text mode represented bold, italics,
> >> underlining, etc.?
> >
> > You mean, copying from some part of WordPerfect document to another
> > part of the same document?  Because DOS supported only one program at
> > at ime, and didn't have a clipboard (or anything similar) at all.
> 
> OK, good point. What about WordPerfect for Windows, or WordPad, or [pick your favorite tool from a non-Windows environment] where the formatting isn?t visible?

If pasting between applications, the answer is again clipboard format
that is not plain text.  If you copy plain text, the formatting is
lost.

From asmusf at ix.netcom.com  Fri Jan 12 14:42:41 2024
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Fri, 12 Jan 2024 12:42:41 -0800
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <SJ0PR03MB65983E04A791C9530417AAFDCA6F2@SJ0PR03MB6598.namprd03.prod.outlook.com>
References: <trinity-a1983efa-4355-4c5e-9b47-29cfb692613d-1704975711774@3c-app-webde-bs41>
 <SJ0PR03MB65983E04A791C9530417AAFDCA6F2@SJ0PR03MB6598.namprd03.prod.outlook.com>
Message-ID: <1f2c27f9-eb03-4fd3-8e36-9a7d1aba39ce@ix.netcom.com>

On 1/12/2024 9:35 AM, Doug Ewell via Unicode wrote:
> Did we not have any way to manipulate styled text before HTML came along?

No, of course not. HTML is the original sin.

A./
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240112/db2ccf2c/attachment.htm>

From doug at ewellic.org  Fri Jan 12 16:08:19 2024
From: doug at ewellic.org (Doug Ewell)
Date: Fri, 12 Jan 2024 22:08:19 +0000
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <834jfiia4l.fsf@gnu.org>
References: <trinity-a1983efa-4355-4c5e-9b47-29cfb692613d-1704975711774@3c-app-webde-bs41>
 <SJ0PR03MB65983E04A791C9530417AAFDCA6F2@SJ0PR03MB6598.namprd03.prod.outlook.com>
 <838r4uifdg.fsf@gnu.org>
 <SJ0PR03MB65988D31C226F04FC527D72ECA6F2@SJ0PR03MB6598.namprd03.prod.outlook.com>
 <834jfiia4l.fsf@gnu.org>
Message-ID: <SJ0PR03MB6598315FE8F462078D9B84D9CA6F2@SJ0PR03MB6598.namprd03.prod.outlook.com>

Eli Zaretskii wrote:

> Sorry, I'm probably missing something, because I don't see the
> relevance.  My point is that copy/paste through the clipboard uses
> formats that are not plain text, and encode the styles and typefaces
> by using methods that are not compatible with plain text.

I think Marius will have to address what he meant, as you and I are talking past each other.

If ECMA-48 markup is part of the plain-text stream, and it is copied from one app to another in a plain-text Clipboard, then all of the ECMA-48 sequences should survive the transit.

>> Alternatively, why is the stated user-experience problem for ECMA-48
>> not a problem for Word?
>
> I thought I answered that?  Or what do you mean by "user experience"?

That question was semi-rhetorical, and was for Marius, who again will need to respond. I thought he was talking about the human user trying to select text to be copied, and inadvertently failing to select a starting or ending ECMA-48 sequence because they are not human-visible.

> If pasting between applications, the answer is again clipboard format
> that is not plain text.  If you copy plain text, the formatting is
> lost.

Wait: are we saying that ECMA-48 sequences like CSI 31m are plain text, or that they are not?

--
Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org


From sosipiuk at gmail.com  Fri Jan 12 17:06:59 2024
From: sosipiuk at gmail.com (=?UTF-8?Q?S=C5=82awomir_Osipiuk?=)
Date: Fri, 12 Jan 2024 23:06:59 +0000
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <SJ0PR03MB6598315FE8F462078D9B84D9CA6F2@SJ0PR03MB6598.namprd03.prod.outlook.com>
References: <SJ0PR03MB6598315FE8F462078D9B84D9CA6F2@SJ0PR03MB6598.namprd03.prod.outlook.com>
Message-ID: <1705100313111.3551241438.264195751@gmail.com>


On Friday, 12 January 2024, 17:08:19 (-05:00), Doug Ewell via Unicode 
wrote:
 >
 > That question was semi-rhetorical, and was for Marius, who again will 
need to respond. I thought he was talking about the human user trying to 
select text to be copied, and inadvertently failing to select a starting or 
ending ECMA-48 sequence because they are not human-visible.

If the terminal is respecting ECMA-48 styling then "selecting" and 
"copying" should be performed in some sane enforced manner that doesn't 
result in orphaned or mismatched styling control codes.

 >
 > Wait: are we saying that ECMA-48 sequences like CSI 31m are plain text, 
or that they are not?

IMO they're very much not, but I expect some will disagree with that.

From marius.spix at web.de  Fri Jan 12 17:26:18 2024
From: marius.spix at web.de (Marius Spix)
Date: Sat, 13 Jan 2024 00:26:18 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <SJ0PR03MB6598315FE8F462078D9B84D9CA6F2@SJ0PR03MB6598.namprd03.prod.outlook.com>
References: <trinity-a1983efa-4355-4c5e-9b47-29cfb692613d-1704975711774@3c-app-webde-bs41>
 <SJ0PR03MB65983E04A791C9530417AAFDCA6F2@SJ0PR03MB6598.namprd03.prod.outlook.com>
 <838r4uifdg.fsf@gnu.org>
 <SJ0PR03MB65988D31C226F04FC527D72ECA6F2@SJ0PR03MB6598.namprd03.prod.outlook.com>
 <834jfiia4l.fsf@gnu.org>
 <SJ0PR03MB6598315FE8F462078D9B84D9CA6F2@SJ0PR03MB6598.namprd03.prod.outlook.com>
Message-ID: <20240113002618.575cb477@spixxi>

Applications like Word or web browsers are able to preserve formatting
by using rich text formats like HTML or RTF in the clipboard. ECMA-48
proposed styling controls work on the plaintext layer, independenlty
from the application, as long the renderer (e. g. Uniscribe or HarfBuzz)
supports them. That would require the clipboard handler of the
operating system to be aware of these sequences.


Am Fri, 12 Jan 2024 22:08:19 +0000
schrieb Doug Ewell <doug at ewellic.org>:

> Eli Zaretskii wrote:
>
> > Sorry, I'm probably missing something, because I don't see the
> > relevance.  My point is that copy/paste through the clipboard uses
> > formats that are not plain text, and encode the styles and typefaces
> > by using methods that are not compatible with plain text.
>
> I think Marius will have to address what he meant, as you and I are
> talking past each other.
>
> If ECMA-48 markup is part of the plain-text stream, and it is copied
> from one app to another in a plain-text Clipboard, then all of the
> ECMA-48 sequences should survive the transit.
>
> >> Alternatively, why is the stated user-experience problem for
> >> ECMA-48 not a problem for Word?
> >
> > I thought I answered that?  Or what do you mean by "user
> > experience"?
>
> That question was semi-rhetorical, and was for Marius, who again will
> need to respond. I thought he was talking about the human user trying
> to select text to be copied, and inadvertently failing to select a
> starting or ending ECMA-48 sequence because they are not
> human-visible.
>
> > If pasting between applications, the answer is again clipboard
> > format that is not plain text.  If you copy plain text, the
> > formatting is lost.
>
> Wait: are we saying that ECMA-48 sequences like CSI 31m are plain
> text, or that they are not?
>
> --
> Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org
>


From asmusf at ix.netcom.com  Fri Jan 12 17:58:48 2024
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Fri, 12 Jan 2024 15:58:48 -0800
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <20240113002618.575cb477@spixxi>
References: <trinity-a1983efa-4355-4c5e-9b47-29cfb692613d-1704975711774@3c-app-webde-bs41>
 <SJ0PR03MB65983E04A791C9530417AAFDCA6F2@SJ0PR03MB6598.namprd03.prod.outlook.com>
 <838r4uifdg.fsf@gnu.org>
 <SJ0PR03MB65988D31C226F04FC527D72ECA6F2@SJ0PR03MB6598.namprd03.prod.outlook.com>
 <834jfiia4l.fsf@gnu.org>
 <SJ0PR03MB6598315FE8F462078D9B84D9CA6F2@SJ0PR03MB6598.namprd03.prod.outlook.com>
 <20240113002618.575cb477@spixxi>
Message-ID: <7cdcaf91-f9e7-47ce-9dc6-7cd8c3d38f67@ix.netcom.com>

ECMA-48 is not plain text. It is a form of markup that uses syntax 
characters other than those from the printable ASCII range, but that's 
about the only distinction.

It's different from a true binary format as well, which would use things 
like addresses and lengths to mark the location of text runs and styling 
info. Instead, like any other markup, it uses character codes inserted 
into the data stream.

Now that we have that out of the way, let's look at the clipboard.

The clipboard contains both data and metadata. By telling a recipient 
that data is in HTML format it can be displayed as rich text, instead of 
as HTML source. The same is true for rtf or ECMA-48.

The same data can be present in multiple formats on the clipboard. 
That's what's behind the ability to paste "just the text" from a copied 
section, discarding the styling.

Logically, for that to work, either the sender or the recipient of the 
clipboard data must understand what the "just the text" part of the data 
represents and how to discard the styling. It's been too long, but from 
what I remember, it was the sender that had the option of offering 
multiple formats and the recipient could pick any that it understood.

That's the only logical approach, because only the sender can be assumed 
to know the format the data is in. The receiver could do post-processing 
only on data formats already known to it.

Your ECMA-48 terminal app would presumably want to offer both the 
ECMA-48 stream with suitable metadata defining it as such, as well a 
plain-text stream, which discards the styling.

For nested styling syntax I don't know whether sending applications 
would perform an "auto close" of any open styling commands when 
packaging up the selected text, or whether that would be done by the 
receiving app, assuming it understands the format. The problem how to 
handle selection at the boundary of a style run when the style commands 
themselves are not visible to the user is the same for markup languages 
as for ECMA-48.

Nothing new to see here, move right along.

A./

On 1/12/2024 3:26 PM, Marius Spix via Unicode wrote:
> Applications like Word or web browsers are able to preserve formatting
> by using rich text formats like HTML or RTF in the clipboard. ECMA-48
> proposed styling controls work on the plaintext layer, independenlty
> from the application, as long the renderer (e. g. Uniscribe or HarfBuzz)
> supports them. That would require the clipboard handler of the
> operating system to be aware of these sequences.
>
>
> Am Fri, 12 Jan 2024 22:08:19 +0000
> schrieb Doug Ewell<doug at ewellic.org>:
>
>> Eli Zaretskii wrote:
>>
>>> Sorry, I'm probably missing something, because I don't see the
>>> relevance.  My point is that copy/paste through the clipboard uses
>>> formats that are not plain text, and encode the styles and typefaces
>>> by using methods that are not compatible with plain text.
>> I think Marius will have to address what he meant, as you and I are
>> talking past each other.
>>
>> If ECMA-48 markup is part of the plain-text stream, and it is copied
>> from one app to another in a plain-text Clipboard, then all of the
>> ECMA-48 sequences should survive the transit.
>>
>>>> Alternatively, why is the stated user-experience problem for
>>>> ECMA-48 not a problem for Word?
>>> I thought I answered that?  Or what do you mean by "user
>>> experience"?
>> That question was semi-rhetorical, and was for Marius, who again will
>> need to respond. I thought he was talking about the human user trying
>> to select text to be copied, and inadvertently failing to select a
>> starting or ending ECMA-48 sequence because they are not
>> human-visible.
>>
>>> If pasting between applications, the answer is again clipboard
>>> format that is not plain text.  If you copy plain text, the
>>> formatting is lost.
>> Wait: are we saying that ECMA-48 sequences like CSI 31m are plain
>> text, or that they are not?
>>
>> --
>> Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240112/675565cd/attachment.htm>

From kent.b.karlsson at bahnhof.se  Sat Jan 13 17:04:56 2024
From: kent.b.karlsson at bahnhof.se (Kent Karlsson)
Date: Sun, 14 Jan 2024 00:04:56 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update 
In-Reply-To: <SJ0PR03MB6598315FE8F462078D9B84D9CA6F2@SJ0PR03MB6598.namprd03.prod.outlook.com>
References: <SJ0PR03MB6598315FE8F462078D9B84D9CA6F2@SJ0PR03MB6598.namprd03.prod.outlook.com>
Message-ID: <7DA70628-D2D4-4FD4-8D3E-96F32A28808F@bahnhof.se>


> 12 jan. 2024 kl. 23:10 skrev Doug Ewell via Unicode <unicode at corp.unicode.org>:
> 
> ?Eli Zaretskii wrote:
> 
>> Sorry, I'm probably missing something, because I don't see the
>> relevance.  My point is that copy/paste through the clipboard uses
>> formats that are not plain text, and encode the styles and typefaces
>> by using methods that are not compatible with plain text.
> 
> I think Marius will have to address what he meant, as you and I are talking past each other.
> 
> If ECMA-48 markup is part of the plain-text stream, and it is copied from one app to another in a plain-text Clipboard, then all of the ECMA-48 sequences should survive the transit.

In section 13 I give a short general ?rant? about cut-and-paste. It doesn?t go into details about what ?conversions? must be done in order to preserve the styling, since that is beyond the scope of the proposal. But it is nothing specific for using ECMA-48 as external (file) representation. The issue (but not details) are the same for all other (reasonable) ways of styling text.

/Kent K

>>> Alternatively, why is the stated user-experience problem for ECMA-48
>>> not a problem for Word?
>> 
>> I thought I answered that?  Or what do you mean by "user experience"?
> 
> That question was semi-rhetorical, and was for Marius, who again will need to respond. I thought he was talking about the human user trying to select text to be copied, and inadvertently failing to select a starting or ending ECMA-48 sequence because they are not human-visible.
> 
>> If pasting between applications, the answer is again clipboard format
>> that is not plain text.  If you copy plain text, the formatting is
>> lost.
> 
> Wait: are we saying that ECMA-48 sequences like CSI 31m are plain text, or that they are not?
> 
> --
> Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org
> 
> 


From kent.b.karlsson at bahnhof.se  Sat Jan 13 17:47:57 2024
From: kent.b.karlsson at bahnhof.se (Kent Karlsson)
Date: Sun, 14 Jan 2024 00:47:57 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update 
In-Reply-To: <7cdcaf91-f9e7-47ce-9dc6-7cd8c3d38f67@ix.netcom.com>
References: <7cdcaf91-f9e7-47ce-9dc6-7cd8c3d38f67@ix.netcom.com>
Message-ID: <C6BFADCC-39B6-45F7-8110-BAEC808869EA@bahnhof.se>

An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240114/881bf398/attachment-0001.htm>

From kent.b.karlsson at bahnhof.se  Sun Jan 14 18:37:02 2024
From: kent.b.karlsson at bahnhof.se (Kent Karlsson)
Date: Mon, 15 Jan 2024 01:37:02 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update 
Message-ID: <DE448A2C-C610-416B-A51E-675C51084310@bahnhof.se>

An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240115/f81d8dd0/attachment.htm>

From cate at cateee.net  Mon Jan 15 01:58:57 2024
From: cate at cateee.net (Giacomo Catenazzi)
Date: Mon, 15 Jan 2024 08:58:57 +0100
Subject: Ecma-48 proposed styling controls update updated & math
 expression representation proposal update
In-Reply-To: <DE448A2C-C610-416B-A51E-675C51084310@bahnhof.se>
References: <DE448A2C-C610-416B-A51E-675C51084310@bahnhof.se>
Message-ID: <dab5dd93-2467-40b4-9455-041d959f6f91@cateee.net>

Hello,

Reading this discussion, I can with some conclusions:

The proposal should be split in 2 parts:

- Part I, point a):

We need to document (and "standardize") current usage of CSI (and other 
ECMA-48-like parts). Common usage, nothing new. At least one can 
recommend best way in case of duplicates, or *deprecate* something not 
much used. I do not see how to go forward without such fundament. And we 
may see that some parts were already implemented (remember: Unicode is 
not about designing something new: we have a lot of historical 
backpacks). Note: on such part I could help

- Part I, point b):

Then we can try to extend the commands, for **terminal emulators**. The 
part a) would help to create a solid base, and possibly to get some/many 
terminal emulator maintainer on the group. It is not awful to 
standardize something without support and help of people which should 
use the standard. And ECMA-48 is used on such case: *terminal emulators*

- Part II):

Extend outside terminal emulator. Which I find the most problematic 
part: we are creating a document "sub-standard" (so by design not good) 
for a thing we already have well used standards (maybe just define a 
subset of html). And in any case Part I and Part II are on different 
layer then common Unicode (a part of bidi and ruby which are often 
mentioned, but also can be standardized and it is also done so, in a 
different layer). Again: it is a different layer, so we need a parser, 
and so html of ECMA-48 is irrelevant (but one is robust, and widely used).

In any case I do not find useful to discuss to this second part if we do 
not have the first part done. And also we need to have people which want 
to implement it on own programs: we already have red text in ECMA-48, 
but which program support copying maintaining the formatting? Very few 
(discarding where just bytes are copied, so copy at different layer).


Why we need a new standard where we have already good one? Transcoding 
is bread-and-butter (and ECMA provides us a lot of *private* space if we 
need to encode formatting on such way).


I see just vapour: no use case, no interest on programmers to adapt own 
programs, so for now it will bloat technical documents without use. I 
dislike the *in future may be used* in the discussions. I want a 
reference implementation to check if things can be useful (and if it can 
be coded), and support from developers of some used program.


Note: do not use Unicode standard as an argument: we have a lot of awful 
parts (fortunately hidden and forgotten). IIRC also for simple 
formatting text. Bidi in Unicode is not a reason to do it again, or to 
implement such layers again in Unicode.


giacomo


On 15 Jan 2024 01:37, Kent Karlsson via Unicode wrote:
> ?
> (Second reply to same email)
> 
>> 13 jan. 2024 kl. 01:00 skrev Asmus Freytag via Unicode 
>> <unicode at corp.unicode.org>:
>> ?
>> ECMA-48 is not plain text. It is a form of markup that uses syntax 
>> characters other than those from the printable ASCII range, but that's 
>> about the only distinction.
> 
> But that is a key distinction.
> 
> How do you think of Unicode bidi controls? Plain text or not? They are 
> at the same ?level? as ECMA-48 controls!
> 
> Speaking of bidi, that has major security issues very similar to those 
> pointed out for ECMA-48 in a reference given in this thread. For source 
> code and math expressions it must be strongly restricted as pointed out 
> in my two proposals, if at all permitted.
> 
>> It's different from a true binary format as well, which would use 
>> things like addresses and lengths to mark the location of text runs 
>> and styling info. Instead, like any other markup, it uses character 
>> codes inserted into the data stream.
> 
> Yes, of course.
> 
> (While uncommon as an external representation, the Teletext protocol, in 
> higher implementation levels, does have an addressing based (i.e. 
> out-of-line) representation for some formatting extensions, like 
> additional colours and bold/italics/proportional.)
> 
>> Now that we have that out of the way, let's look at the clipboard.
>>
>> The clipboard contains both data and metadata. By telling a recipient 
>> that data is in HTML format it can be displayed as rich text, instead 
>> of as HTML source. The same is true for rtf or ECMA-48.
> 
> While I am not super-knowledgeable about clipboards, I gather that at 
> least one type uses a form of limited HTML as a passe-partout for 
> formatted text, regardless of source and target of copy-paste or the 
> file representations they might support. And that is fine.
> 
>> The same data can be present in multiple formats on the clipboard. 
>> That's what's behind the ability to paste "just the text" from a 
>> copied section, discarding the styling.
>>
>> Logically, for that to work, either the sender or the recipient of the 
>> clipboard data must understand what the "just the text" part of the 
>> data represents and how to discard the styling.
> 
> I gather that it is the sender that fills in some of the available 
> alternatives. For instance it can fill in the HTML slot and the ?plain 
> text? slot.
> 
> I don?t think an ECMA-48 slot would be helpful.
> 
> Still it should be, *and is already*, possible to copy-paste styled text 
> from a terminal emulator to (say) a Word document (neither of which use 
> HTML). (Barring bugs and other imperfections.)
> 
>> It's been too long, but from what I remember, it was the sender that 
>> had the option of offering multiple formats and the recipient could 
>> pick any that it understood.
> 
> Yes. Some applications allow the end user to pick which one.
> 
>> That's the only logical approach, because only the sender can be 
>> assumed to know the format the data is in. The receiver could do 
>> post-processing only on data formats already known to it.
>>
>> Your ECMA-48 terminal app 
> 
> ?I? make/maintain no terminal emulator. I just use some (essentially 
> every work-day).
> 
>> would presumably want to offer both the ECMA-48 stream with suitable 
>> metadata defining it as such, as well a plain-text stream, which 
>> discards the styling. 
> 
> HTML + plain text in the clip board. Many only provide plain text at 
> this time. But that may change.
> 
> /Kent K
> 
>> For nested styling syntax I don't know whether sending applications 
>> would perform an "auto close" of any open styling commands when 
>> packaging up the selected text, or whether that would be done by the 
>> receiving app, assuming it understands the format. The problem how to 
>> handle selection at the boundary of a style run when the style 
>> commands themselves are not visible to the user is the same for markup 
>> languages as for ECMA-48.
>>
>> Nothing new to see here, move right along.
>>
>> A./
>>
>> On 1/12/2024 3:26 PM, Marius Spix via Unicode wrote:
>>> Applications like Word or web browsers are able to preserve formatting
>>> by using rich text formats like HTML or RTF in the clipboard. ECMA-48
>>> proposed styling controls work on the plaintext layer, independenlty
>>> from the application, as long the renderer (e. g. Uniscribe or HarfBuzz)
>>> supports them. That would require the clipboard handler of the
>>> operating system to be aware of these sequences.
>>>
>>>
>>> Am Fri, 12 Jan 2024 22:08:19 +0000
>>> schrieb Doug Ewell<doug at ewellic.org>:
>>>
>>>> Eli Zaretskii wrote:
>>>>
>>>>> Sorry, I'm probably missing something, because I don't see the
>>>>> relevance.  My point is that copy/paste through the clipboard uses
>>>>> formats that are not plain text, and encode the styles and typefaces
>>>>> by using methods that are not compatible with plain text.
>>>> I think Marius will have to address what he meant, as you and I are
>>>> talking past each other.
>>>>
>>>> If ECMA-48 markup is part of the plain-text stream, and it is copied
>>>> from one app to another in a plain-text Clipboard, then all of the
>>>> ECMA-48 sequences should survive the transit.
>>>>
>>>>>> Alternatively, why is the stated user-experience problem for
>>>>>> ECMA-48 not a problem for Word?
>>>>> I thought I answered that?  Or what do you mean by "user
>>>>> experience"?
>>>> That question was semi-rhetorical, and was for Marius, who again will
>>>> need to respond. I thought he was talking about the human user trying
>>>> to select text to be copied, and inadvertently failing to select a
>>>> starting or ending ECMA-48 sequence because they are not
>>>> human-visible.
>>>>
>>>>> If pasting between applications, the answer is again clipboard
>>>>> format that is not plain text.  If you copy plain text, the
>>>>> formatting is lost.
>>>> Wait: are we saying that ECMA-48 sequences like CSI 31m are plain
>>>> text, or that they are not?
>>>>
>>>> --
>>>> Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org
>>>>
>>

From richard.wordingham at ntlworld.com  Sun Jan 21 07:52:18 2024
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Sun, 21 Jan 2024 13:52:18 +0000
Subject: Solution for Extended Tamil
Message-ID: <20240121135218.1919b62e@JRWUBU2>

The Unicode Consortium makes some forays into standardising the
encoding of text beyond the mere encoding of characters.  Is there yet a
standard encoding for the first blue word on page 3 of
https://www.unicode.org/L2/L2010/10379--extended-tamil.pdf (Document
L2/10-379)?  The word resembles ?????? <U+0BAA TAMIL LETTER PA, U+2074
SUPERSCRIPT FOUR, U+0BBE TAMIL VOWEL SIGN AA, U+0BB5 TAMIL LETTER VA,
U+0BAE LETTER MA, U+0BCD TAMIL SIGN VIRAMA>, but without a dotted
circle, and is or closely relates to the Sanskrit word 'bh??vam'.  I
would not be surprised at context-sensitive rules for whether the
sequence should be ended with U+200C ZERO WIDTH NON-JOINER.

One possible solution would be for U+00B2, U+00B3 and U+2074 to be
treated as nuktas, but that invalidates or creates a confusable for the
current solution for sequences without a right matra, which is to use
the order <consonant, vowel, superscript digit>.

Another possible solution is to define a special visual rearrangement
for the sequences <consonant, (U+0BBE|U+0BCA|U+0BCB|U+0BCC|U+0BD7),
superscript digit> and their canonical equivalents.

Is it perhaps the case that the word I mentioned can only be encoded
using the PUA?

Richard.


From jameskass at code2001.com  Sun Jan 21 09:26:45 2024
From: jameskass at code2001.com (James Kass)
Date: Sun, 21 Jan 2024 15:26:45 +0000
Subject: Solution for Extended Tamil
In-Reply-To: <20240121135218.1919b62e@JRWUBU2>
References: <20240121135218.1919b62e@JRWUBU2>
Message-ID: <1bb00d3a-d950-4778-8d12-4842cb2ac39d@code2001.com>


On 2024-01-21 1:52 PM, Richard Wordingham via Unicode wrote:
> The Unicode Consortium makes some forays into standardising the
> encoding of text beyond the mere encoding of characters.  Is there yet a
> standard encoding for the first blue word on page 3 of
> https://www.unicode.org/L2/L2010/10379--extended-tamil.pdf  (Document
> L2/10-379)?  The word resembles ?????? <U+0BAA TAMIL LETTER PA, U+2074
> SUPERSCRIPT FOUR, U+0BBE TAMIL VOWEL SIGN AA, U+0BB5 TAMIL LETTER VA,
> U+0BAE LETTER MA, U+0BCD TAMIL SIGN VIRAMA>, but without a dotted
> circle, and is or closely relates to the Sanskrit word 'bh??vam'.  I
> would not be surprised at context-sensitive rules for whether the
> sequence should be ended with U+200C ZERO WIDTH NON-JOINER.
>
> One possible solution would be for U+00B2, U+00B3 and U+2074 to be
> treated as nuktas, but that invalidates or creates a confusable for the
> current solution for sequences without a right matra, which is to use
> the order <consonant, vowel, superscript digit>.
Perhaps the simplest solution to this display issue would be to persuade 
the user community to place the superscript digit after the syllable it 
modifies and spell the word like ??????.? In other words, expand the 
current solution for sequences without a right matra to all sequences 
<consonant, vowel, superscript digit>. That would eliminate the pesky 
dotted circle.

Failing that, either a specialty font with a zero-width zero-contour 
glyph mapped to the dotted circle character, or a cumbersome reworking 
of existing font display engines to accomodate this unusual construction.


From jameskass at code2001.com  Sun Jan 21 12:16:14 2024
From: jameskass at code2001.com (James Kass)
Date: Sun, 21 Jan 2024 18:16:14 +0000
Subject: Solution for Extended Tamil
In-Reply-To: <1bb00d3a-d950-4778-8d12-4842cb2ac39d@code2001.com>
References: <20240121135218.1919b62e@JRWUBU2>
 <1bb00d3a-d950-4778-8d12-4842cb2ac39d@code2001.com>
Message-ID: <fd62e814-9e62-491b-8a70-c8a22cc3c6c2@code2001.com>


On 2024-01-21 3:26 PM, James Kass via Unicode wrote:
> Perhaps the simplest solution to this display issue would be to 
> persuade the user community to place the superscript digit after the 
> syllable it modifies and spell the word like ??????.

http://www.brahminsnet.com/forums/forum/religious/bhakthi-pooja-sthothrams/anmikam-bhakthi-pooja/9082-

It appears that the user community is already placing the superscript 
digit at the end of the syllable.? The word "??????" appears in the page 
linked above, which is from 2014.

Likewise, the same word shows up in this page:
https://www.bible.com/bible/2102/EPH.1.santm

Indeed, a web search for "??????" finds plenty of hits, but a web search 
for "??????" finds nothing.? So the original question seems moot.


From richard.wordingham at ntlworld.com  Sun Jan 21 19:19:18 2024
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Mon, 22 Jan 2024 01:19:18 +0000
Subject: Solution for Extended Tamil
In-Reply-To: <fd62e814-9e62-491b-8a70-c8a22cc3c6c2@code2001.com>
References: <20240121135218.1919b62e@JRWUBU2>
 <1bb00d3a-d950-4778-8d12-4842cb2ac39d@code2001.com>
 <fd62e814-9e62-491b-8a70-c8a22cc3c6c2@code2001.com>
Message-ID: <20240122011918.5c0ef2cd@JRWUBU2>

On Sun, 21 Jan 2024 18:16:14 +0000
James Kass via Unicode <unicode at corp.unicode.org> wrote:

> On 2024-01-21 3:26 PM, James Kass via Unicode wrote:
> > Perhaps the simplest solution to this display issue would be to 
> > persuade the user community to place the superscript digit after
> > the syllable it modifies and spell the word like ??????.  
> 
> http://www.brahminsnet.com/forums/forum/religious/bhakthi-pooja-sthothrams/anmikam-bhakthi-pooja/9082-
> 
> It appears that the user community is already placing the superscript 
> digit at the end of the syllable.? The word "??????" appears in the
> page linked above, which is from 2014.
> 
> Likewise, the same word shows up in this page:
> https://www.bible.com/bible/2102/EPH.1.santm
> 
> Indeed, a web search for "??????" finds plenty of hits, but a web
> search for "??????" finds nothing.? So the original question seems
> moot.

Or the user community is only using Unicode if they can stomach always
putting the digit last.  Note that Google failed to find "??????"
amongst the Unicode documents.  I've quoted and linked to your
observation at Wiktionary, best seen at
https://en.wiktionary.org/wiki/Module:sa-convert/testcases/Tamil.

Richard.


From samjnaa at gmail.com  Mon Jan 22 05:01:48 2024
From: samjnaa at gmail.com (Shriramana Sharma)
Date: Mon, 22 Jan 2024 16:31:48 +0530
Subject: Solution for Extended Tamil
In-Reply-To: <20240122011918.5c0ef2cd@JRWUBU2>
References: <20240121135218.1919b62e@JRWUBU2>
 <1bb00d3a-d950-4778-8d12-4842cb2ac39d@code2001.com>
 <fd62e814-9e62-491b-8a70-c8a22cc3c6c2@code2001.com>
 <20240122011918.5c0ef2cd@JRWUBU2>
Message-ID: <CAH-HCWXQAExoTkz0gQTivtA5+G6XQgBEqrCeH=eJKyneFR_Z6Q@mail.gmail.com>

Please see the original attestations. I have noted that they always put the
digit immediately after the consonant.

There is not much meaning IMO in quoting online attestations or search
results because when it doesn't display properly and throws a dotted
circle, they will adjust it so that it doesn't display such junk. Speaking
as one of the authors of a de facto Unicode-based transliteration scheme
from Devanagari to Tamil which seems to be widely used (but we can't get
assured statistics).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240122/96e4995c/attachment-0001.htm>

From jameskass at code2001.com  Mon Jan 22 12:23:03 2024
From: jameskass at code2001.com (James Kass)
Date: Mon, 22 Jan 2024 18:23:03 +0000
Subject: Solution for Extended Tamil
In-Reply-To: <CAH-HCWXQAExoTkz0gQTivtA5+G6XQgBEqrCeH=eJKyneFR_Z6Q@mail.gmail.com>
References: <20240121135218.1919b62e@JRWUBU2>
 <1bb00d3a-d950-4778-8d12-4842cb2ac39d@code2001.com>
 <fd62e814-9e62-491b-8a70-c8a22cc3c6c2@code2001.com>
 <20240122011918.5c0ef2cd@JRWUBU2>
 <CAH-HCWXQAExoTkz0gQTivtA5+G6XQgBEqrCeH=eJKyneFR_Z6Q@mail.gmail.com>
Message-ID: <80a0eee8-6221-426a-a79a-d09a7324a825@code2001.com>


On 2024-01-22 11:01 AM, Shriramana Sharma via Unicode wrote:
> Please see the original attestations. I have noted that they always 
> put the digit immediately after the consonant.
>
> There is not much meaning IMO in quoting online attestations or search 
> results because when it doesn't display properly and throws a dotted 
> circle, they will adjust it so that it doesn't display such junk. 
> Speaking as one of the authors of a de facto Unicode-based 
> transliteration scheme from Devanagari to Tamil which seems to be 
> widely used (but we can't get assured statistics).

Quoting from 
https://en.wiktionary.org/wiki/Module:sa-convert/testcases/Tamil :

"in most forms of Extended Tamil (including the Gita book mentioned 
previously running to almost 420,000 copies) the diacritics are placed 
between the consonant and any vowel signs placed to the right".

Maybe not always, for example : "???????????" -- would the superscript 
digit be expected to break the ligature here?

As we know, when typing Tamil on a mechanical typewriter, for example, 
U+0BC6 TAMIL VOWEL SIGN E was always typed before the consonant.? But in 
the standardized computer encoding for Tamil, U+0BC6 is always entered 
after the consonant.? In both cases, the display properly shows the 
vowel sign on the left of the consonant.

The original question here was about a standardized encoding order for 
Extended Tamil, and the user community has apparently already chosen a 
/de facto/ standardization.? And the results are legible.

Placing the superscript digits next to the consonants instead of at the 
end of the syllable appears to be a display issue.? But superscript 
digits are "number, other" and "not reordered"; so the rendering system 
won't automatically treat the digits as marks. Encoding clones of the 
superscript digits to be treated as marks might not be practical.? And, 
after all, the character identity of those superscript digits is that 
they are superscript digits.

Has any effort been made to use OpenType to get the desired display?? 
Classifying the superscripts digits as "marks" in the GDEF (glyph 
definition) table and then using GPOS (glyph positioning) for the 
desired placement?? Or has the user community accepted the plain-text 
legibility of the /de facto/ standard encoding order and reconciled with 
the fact that not all published books can be exactly rendered in plain-text?


From richard.wordingham at ntlworld.com  Mon Jan 22 14:07:26 2024
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Mon, 22 Jan 2024 20:07:26 +0000
Subject: Solution for Extended Tamil
In-Reply-To: <80a0eee8-6221-426a-a79a-d09a7324a825@code2001.com>
References: <20240121135218.1919b62e@JRWUBU2>
 <1bb00d3a-d950-4778-8d12-4842cb2ac39d@code2001.com>
 <fd62e814-9e62-491b-8a70-c8a22cc3c6c2@code2001.com>
 <20240122011918.5c0ef2cd@JRWUBU2>
 <CAH-HCWXQAExoTkz0gQTivtA5+G6XQgBEqrCeH=eJKyneFR_Z6Q@mail.gmail.com>
 <80a0eee8-6221-426a-a79a-d09a7324a825@code2001.com>
Message-ID: <20240122200726.57f528de@JRWUBU2>

On Mon, 22 Jan 2024 18:23:03 +0000
James Kass via Unicode <unicode at corp.unicode.org> wrote:

> Has any effort been made to use OpenType to get the desired display?? 
> Classifying the superscripts digits as "marks" in the GDEF (glyph 
> definition) table and then using GPOS (glyph positioning) for the 
> desired placement?? Or has the user community accepted the plain-text 
> legibility of the /de facto/ standard encoding order and reconciled
> with the fact that not all published books can be exactly rendered in
> plain-text?

I think the first big question is whether the font (envisioned as active
code) will be presented with letter, non-ligating right matra and digit
in the same glyph run.  If that happens, I would start with them as base
characters in the GDEF and treat it as an exercise in reeordering at
the GSUB level.  It's no worse than the re-ordering I do to render Tai
Tham starting from the characters transcoded to Latin letters, even if
there be a mark between matra and digit. The digits need to be bases -
I can imagine a sequence <KA, AA, SUPERSCRIPT TWO, ZWNJ, SUPERSCRIPT
TWO> where the second SUPERSCRIPT TWO is referencing a footnote. The
other big question is whether the reordering is compliant with Unicode.

Using GPOS for re-ordering seems to be a nightmare.

Where do you get this 'non-reordering' property from?

Richard.


From asmusf at ix.netcom.com  Mon Jan 22 14:48:36 2024
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Mon, 22 Jan 2024 12:48:36 -0800
Subject: Earliest documented runes in DK ?
Message-ID: <56ee8013-abf8-4689-9b10-a9e63075924c@ix.netcom.com>

Thomas Milo alerted me to this:

https://www.dr.dk/nyheder/indland/arkaeologer-finder-danmarks-aeldste-runer-paa-fyn?fbclid=IwAR3BpE3Kfja_hzjG5EBzDAurJ0g19lBsySiRbr3Rd2-Ghypa05_gVngj3hw
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240122/862d0559/attachment.htm>

From jameskass at code2001.com  Mon Jan 22 15:18:12 2024
From: jameskass at code2001.com (James Kass)
Date: Mon, 22 Jan 2024 21:18:12 +0000
Subject: Solution for Extended Tamil
In-Reply-To: <20240122200726.57f528de@JRWUBU2>
References: <20240121135218.1919b62e@JRWUBU2>
 <1bb00d3a-d950-4778-8d12-4842cb2ac39d@code2001.com>
 <fd62e814-9e62-491b-8a70-c8a22cc3c6c2@code2001.com>
 <20240122011918.5c0ef2cd@JRWUBU2>
 <CAH-HCWXQAExoTkz0gQTivtA5+G6XQgBEqrCeH=eJKyneFR_Z6Q@mail.gmail.com>
 <80a0eee8-6221-426a-a79a-d09a7324a825@code2001.com>
 <20240122200726.57f528de@JRWUBU2>
Message-ID: <672200b2-962e-4f89-8414-3ec68c7e3a30@code2001.com>


On 2024-01-22 8:07 PM, Richard Wordingham via Unicode wrote:

 > Using GPOS for re-ordering seems to be a nightmare.

Which is why I didn't try it here.? You have more experience with modern 
typography tools than I do, so I would defer to your better judgment for 
possible approaches.

 > Where do you get this 'non-reordering' property from?

The canonical comb. class = 0, which means non-reordering.

Note that in the text quoted earlier from 
https://en.wiktionary.org/wiki/Module:sa-convert/testcases/Tamil it is 
said that "most forms of Extended Tamil...".? This suggests that there 
are other conventions which place the superscript digits elsewhere.? If, 
as suspected, an existing convention places the superscript digit at the 
end of the syllable, then the /de facto/ encoding sequence and default 
display might well be totally acceptable to the user community.


From richard.wordingham at ntlworld.com  Mon Jan 22 16:32:13 2024
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Mon, 22 Jan 2024 22:32:13 +0000
Subject: Solution for Extended Tamil
In-Reply-To: <672200b2-962e-4f89-8414-3ec68c7e3a30@code2001.com>
References: <20240121135218.1919b62e@JRWUBU2>
 <1bb00d3a-d950-4778-8d12-4842cb2ac39d@code2001.com>
 <fd62e814-9e62-491b-8a70-c8a22cc3c6c2@code2001.com>
 <20240122011918.5c0ef2cd@JRWUBU2>
 <CAH-HCWXQAExoTkz0gQTivtA5+G6XQgBEqrCeH=eJKyneFR_Z6Q@mail.gmail.com>
 <80a0eee8-6221-426a-a79a-d09a7324a825@code2001.com>
 <20240122200726.57f528de@JRWUBU2>
 <672200b2-962e-4f89-8414-3ec68c7e3a30@code2001.com>
Message-ID: <20240122223213.32a88cdf@JRWUBU2>

On Mon, 22 Jan 2024 21:18:12 +0000
James Kass via Unicode <unicode at corp.unicode.org> wrote:

> On 2024-01-22 8:07 PM, Richard Wordingham via Unicode wrote:

>  > Where do you get this 'non-reordering' property from?  
> 
> The canonical comb. class = 0, which means non-reordering.

Most Indic VOWEL SIGNs E have ccc=0 and are left matras and are not
'logical order exceptions', so surely they are re-ordering!

The only canonical combining class connected with reordering at all is
ccc=Left, for which the only characters are Hangul dot tone marks.

> Note that in the text quoted earlier from 
> https://en.wiktionary.org/wiki/Module:sa-convert/testcases/Tamil it
> is said that "most forms of Extended Tamil...".? This suggests that
> there are other conventions which place the superscript digits
> elsewhere.? If, as suspected, an existing convention places the
> superscript digit at the end of the syllable, then the /de facto/
> encoding sequence and default display might well be totally
> acceptable to the user community.

I too misinterpreted that that way.  It was, however, referring to the
'V-I' system, where what appear to be Latin superscript and subcript
letters are suffixed to the CV-unit.  Critically, they're not digits.

Richard.


From kenwhistler at sonic.net  Mon Jan 22 18:00:13 2024
From: kenwhistler at sonic.net (Ken Whistler)
Date: Mon, 22 Jan 2024 16:00:13 -0800
Subject: Solution for Extended Tamil
In-Reply-To: <20240122223213.32a88cdf@JRWUBU2>
References: <20240121135218.1919b62e@JRWUBU2>
 <1bb00d3a-d950-4778-8d12-4842cb2ac39d@code2001.com>
 <fd62e814-9e62-491b-8a70-c8a22cc3c6c2@code2001.com>
 <20240122011918.5c0ef2cd@JRWUBU2>
 <CAH-HCWXQAExoTkz0gQTivtA5+G6XQgBEqrCeH=eJKyneFR_Z6Q@mail.gmail.com>
 <80a0eee8-6221-426a-a79a-d09a7324a825@code2001.com>
 <20240122200726.57f528de@JRWUBU2>
 <672200b2-962e-4f89-8414-3ec68c7e3a30@code2001.com>
 <20240122223213.32a88cdf@JRWUBU2>
Message-ID: <7f95ec33-ac3b-49d2-b58c-f8a118fc5ba2@sonic.net>

ccc=0 means that a character does not reorder for the canonical ordering 
part of the Unicode normalization algorithm. (See Canonical Ordering 
Algorithm in Section 3.11 in the core spec. 
https://www.unicode.org/versions/Unicode15.0.0/ch03.pdf#G49591) That 
sense of non-reordering has nothing to do with reordering (or 
non-reordering) of glyphs left and right for rendering of Indic scripts. 
It sounds like folks are talking past each other on this.

--Ken

On 1/22/2024 2:32 PM, Richard Wordingham via Unicode wrote:
>> The canonical comb. class = 0, which means non-reordering.
> Most Indic VOWEL SIGNs E have ccc=0 and are left matras and are not
> 'logical order exceptions', so surely they are re-ordering!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20240122/52637278/attachment.htm>

From richard.wordingham at ntlworld.com  Tue Jan 23 22:50:58 2024
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Wed, 24 Jan 2024 04:50:58 +0000
Subject: Solution for Extended Tamil
In-Reply-To: <20240121135218.1919b62e@JRWUBU2>
References: <20240121135218.1919b62e@JRWUBU2>
Message-ID: <20240124045058.3438fb87@JRWUBU2>

On Sun, 21 Jan 2024 13:52:18 +0000
Richard Wordingham via Unicode <unicode at corp.unicode.org> wrote:

> The Unicode Consortium makes some forays into standardising the
> encoding of text beyond the mere encoding of characters.  Is there
> yet a standard encoding for the first blue word on page 3 of
> https://www.unicode.org/L2/L2010/10379--extended-tamil.pdf (Document
> L2/10-379)?  The word resembles ?????? <U+0BAA TAMIL LETTER PA, U+2074
> SUPERSCRIPT FOUR, U+0BBE TAMIL VOWEL SIGN AA, U+0BB5 TAMIL LETTER VA,
> U+0BAE LETTER MA, U+0BCD TAMIL SIGN VIRAMA>, but without a dotted
> circle, and is or closely relates to the Sanskrit word 'bh??vam'.  I
> would not be surprised at context-sensitive rules for whether the
> sequence should be ended with U+200C ZERO WIDTH NON-JOINER.
> 
> One possible solution would be for U+00B2, U+00B3 and U+2074 to be
> treated as nuktas, but that invalidates or creates a confusable for
> the current solution for sequences without a right matra, which is to
> use the order <consonant, vowel, superscript digit>.

There doesn't appear to be any Unicode progress beyond L2-10/440
wherein the South Asian subcommitted opined, in that report,

"Indic rendering engines, for example, will need to know that the
superscript numbers should be treated as diacritics (that is, in the
nukta class)."

There was a request for comments from those with implementations, of
which one response predating the report made it to the document
register, L2/10-435, from R. Radhakrishnan, Muthu Nedumaran, which
exhibited elegant rendering using AAT.  (I fear that that's no more
conclusive than finding a Graphite font that can render the sequences.)

Can we honestly claim that subcommittee report as a finding of fact by
the UTC?  If we can, that would declare that the correct placement of
the superscript digit is immediately after the consonant.

Richard.


> 
> Another possible solution is to define a special visual rearrangement
> for the sequences <consonant, (U+0BBE|U+0BCA|U+0BCB|U+0BCC|U+0BD7),
> superscript digit> and their canonical equivalents.
> 
> Is it perhaps the case that the word I mentioned can only be encoded
> using the PUA?
> 
> Richard.
> 


From richard.wordingham at ntlworld.com  Tue Jan 23 23:50:49 2024
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Wed, 24 Jan 2024 05:50:49 +0000
Subject: Solution for Extended Tamil
In-Reply-To: <20240122200726.57f528de@JRWUBU2>
References: <20240121135218.1919b62e@JRWUBU2>
 <1bb00d3a-d950-4778-8d12-4842cb2ac39d@code2001.com>
 <fd62e814-9e62-491b-8a70-c8a22cc3c6c2@code2001.com>
 <20240122011918.5c0ef2cd@JRWUBU2>
 <CAH-HCWXQAExoTkz0gQTivtA5+G6XQgBEqrCeH=eJKyneFR_Z6Q@mail.gmail.com>
 <80a0eee8-6221-426a-a79a-d09a7324a825@code2001.com>
 <20240122200726.57f528de@JRWUBU2>
Message-ID: <20240124055049.3afb48dc@JRWUBU2>

On Mon, 22 Jan 2024 20:07:26 +0000
Richard Wordingham via Unicode <unicode at corp.unicode.org> wrote:

> On Mon, 22 Jan 2024 18:23:03 +0000
> James Kass via Unicode <unicode at corp.unicode.org> wrote:
> 
> > Has any effort been made to use OpenType to get the desired
> > display? Classifying the superscripts digits as "marks" in the GDEF
> > (glyph definition) table and then using GPOS (glyph positioning)
> > for the desired placement?? Or has the user community accepted the
> > plain-text legibility of the /de facto/ standard encoding order and
> > reconciled with the fact that not all published books can be
> > exactly rendered in plain-text?  
> 
> I think the first big question is whether the font (envisioned as
> active code) will be presented with letter, non-ligating right matra
> and digit in the same glyph run.

I suppose we should try it, but I have a suspicion that that won't
happen under OpenType.  It might be possible to circumvent the full
application of the script-dependent shaping engine in some renderers by
not supplying any OTL data for the Tamil script, but relying on
definitions for the default script.  While I know that trick for the USE
for Tai Tham will leave us with a feature by default, I don't know
whether that is so for the other rendering engines.  Additionally, the
run might still be restricted to one 'cluster', which would defeat the
font.

Richard.


From jameskass at code2001.com  Wed Jan 24 14:23:42 2024
From: jameskass at code2001.com (James Kass)
Date: Wed, 24 Jan 2024 20:23:42 +0000
Subject: Solution for Extended Tamil
In-Reply-To: <20240124045058.3438fb87@JRWUBU2>
References: <20240121135218.1919b62e@JRWUBU2> <20240124045058.3438fb87@JRWUBU2>
Message-ID: <4356fe44-ef94-4fd8-8c53-f308cdad6575@code2001.com>


On 2024-01-24 4:50 AM, Richard Wordingham via Unicode wrote:
> There doesn't appear to be any Unicode progress beyond L2-10/440
> wherein the South Asian subcommitted opined, in that report,
>
> "Indic rendering engines, for example, will need to know that the
> superscript numbers should be treated as diacritics (that is, in the
> nukta class)."
>
> There was a request for comments from those with implementations, of
> which one response predating the report made it to the document
> register, L2/10-435, from R. Radhakrishnan, Muthu Nedumaran, which
> exhibited elegant rendering using AAT.  (I fear that that's no more
> conclusive than finding a Graphite font that can render the sequences.)
>
> Can we honestly claim that subcommittee report as a finding of fact by
> the UTC?  If we can, that would declare that the correct placement of
> the superscript digit is immediately after the consonant.

Previous contact with Tamil information technology specialists has shown 
that they are intelligent, knowledgeable, resourceful, and practical.? 
So I wondered how the users were actually handling this situation.? 
Hence the web searches for the competing strings.

If the users considered that placing the superscript at the end of the 
syllable was incorrect and a temporary work-around, we'd expect to see 
this reflected in disclaimers on the various web pages.? If the 
specialists in the user community considered syllable-final superscript 
digits to be wrong and could have made an OpenType solution, we'd expect 
to see notices on the web pages offering a downloadable font for 
'correct' display.? Are there any such notices or disclaimers?

Regarding non-Unicode or PUA solutions, TSCII does not support 
superscript digits.? As for TACE16, got both the TAU-Barathi and 
TAC-Barathi Regular fonts from this web page:
https://www.tamilvu.org/ta/tkbd-index-341488
... there are no superscript digits in these fonts.? (TACE16 maps 
precomposed Tamil syllables to the PUA.? Since TACE16 is visual order, 
if its developers wanted to support Extended Tamil, they could.? Maybe 
there are other TACE16 fonts which support superscript digits.)

?????? in TACE16:

U+E291
U+E1F2
U+2074
U+E2E1
U+E2A1
U+E1F0

??????
??????
?????

Neither DuckDuckGo nor Google search even find the string without the 
superscript digit.? Maybe these two search engines reject PUA strings, 
or maybe I've done something wrong, like mentioning "ccc=0" in this thread.


From richard.wordingham at ntlworld.com  Thu Jan 25 07:02:46 2024
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Thu, 25 Jan 2024 13:02:46 +0000
Subject: Solution for Extended Tamil
In-Reply-To: <4356fe44-ef94-4fd8-8c53-f308cdad6575@code2001.com>
References: <20240121135218.1919b62e@JRWUBU2> <20240124045058.3438fb87@JRWUBU2>
 <4356fe44-ef94-4fd8-8c53-f308cdad6575@code2001.com>
Message-ID: <20240125130246.3e548f21@JRWUBU2>

On Wed, 24 Jan 2024 20:23:42 +0000
James Kass via Unicode <unicode at corp.unicode.org> wrote:

> If the users considered that placing the superscript at the end of
> the syllable was incorrect and a temporary work-around, we'd expect
> to see this reflected in disclaimers on the various web pages.? If
> the specialists in the user community considered syllable-final
> superscript digits to be wrong and could have made an OpenType
> solution, we'd expect to see notices on the web pages offering a
> downloadable font for 'correct' display.? Are there any such notices
> or disclaimers?

That's probably a question for those who read Tamil - there may be
little point in putting up such notices in English.

Additionally, I?m not sure that a font that works on Internet Explorer
was possible.  I?m not even sure that an OpenType font can be made for
Chromium or Firefox.  Can one even kern Tamil syllables and superscript
digits using OpenType?

> Regarding non-Unicode or PUA solutions, TSCII does not support 
> superscript digits.? As for TACE16, got both the TAU-Barathi and 
> TAC-Barathi Regular fonts from this web page:
> https://www.tamilvu.org/ta/tkbd-index-341488
> ... there are no superscript digits in these fonts.? (TACE16 maps 
> precomposed Tamil syllables to the PUA.? Since TACE16 is visual
> order, if its developers wanted to support Extended Tamil, they
> could.? Maybe there are other TACE16 fonts which support superscript
> digits.)
> 

TACE16 gets its speed advantages by having only 24 characters and then
being as simple as proportionally-spaced ASCII.  For one extension in
the same style, one would have to add 14 consonants (or 15 if it
supports Vedic Sanskrit) and about another consonant?s worth of
oddments ? spacing anusvara, anunasika, syllabic consonants, visarga,
nasalised dead consonants, and perhaps more.

I suspect its developers mostly loathe ?Extended Tamil?.  And you?ve
already noticed that those TACE16 fonts lack superscript digits. ACE16
gets its speed advantages by having only 24 characters and then being
as simple as proportionally-spaced ASCII.  For one extension in the
same style, one would have to add 14 consonants (or 15 if it supports
Vedic Sanskrit) and about another consonant?s worth of oddments ?
spacing anusvara, anunasika, syllabic consonants, visarga, nasalised
dead consonants, and perhaps more.

Richard.


From jameskass at code2001.com  Thu Jan 25 19:29:45 2024
From: jameskass at code2001.com (James Kass)
Date: Fri, 26 Jan 2024 01:29:45 +0000
Subject: Solution for Extended Tamil
In-Reply-To: <20240125130246.3e548f21@JRWUBU2>
References: <20240121135218.1919b62e@JRWUBU2> <20240124045058.3438fb87@JRWUBU2>
 <4356fe44-ef94-4fd8-8c53-f308cdad6575@code2001.com>
 <20240125130246.3e548f21@JRWUBU2>
Message-ID: <5be27d54-4448-4f1c-88a8-7cf67f23d474@code2001.com>


On 2024-01-25 1:02 PM, Richard Wordingham via Unicode wrote:
>> downloadable font for 'correct' display.? Are there any such notices
>> or disclaimers?
> That's probably a question for those who read Tamil - there may be
> little point in putting up such notices in English.
Exactly.

The print era exhibits in the proposal documents all clearly show that 
the digit is placed next to the consonant, which seems to be the 
classical convention.? It should be considered important to find out 
what the actual users think about this.? If the digital era practice of 
placing the superscripts at the syllable final position is considered a 
temporary work-around, that's one thing.? But if the users consider this 
to be a new, digital era convention which supersedes the classical 
convention, that's something else.? In which case the question becomes 
should both conventions be representable in computer plain-text?? If 
yes, can this be accomplished without changing the /de facto/ standard 
encoding order?? I think it *should* be possible, but that doesn't mean 
it is.? And perhaps the users (and the UTC) would prefer to represent 
both conventions at the encoding level instead of handling it at the 
display level.? I don't know and do not claim to speak for either the 
user community or the UTC.? Just asking questions in an effort to 
understand the issues involved.

> Additionally, I?m not sure that a font that works on Internet Explorer
> was possible.  I?m not even sure that an OpenType font can be made for
> Chromium or Firefox.  Can one even kern Tamil syllables and superscript
> digits using OpenType?
It should work, but I'm not set up to test it at the moment.? If it 
didn't work, I might try "rlig" and make precomposed glyphs 
accordingly.? Even if that meant scads of precomposed glyphs.

-----

(some background)
In the 2010 proposal, 
https://www.unicode.org/L2/L2010/10256r-extended-tamil.pdf , it was 
explained that the proposed characters would have considerable glyphic 
variation.? The code charts used the Tamil forms with western 
superscript digits for reasons explained within the document.? The 
objections to the proposal were essentially that the glyphs being 
proposed as characters could already be represented as sequences and 
they worked just fine.? (As long as one either liked dotted circles or 
was comfortable with the digits appearing syllable final, but this part 
was not mentioned in the objections.)? Even though the proposal 
explained why the proposed characters should not have decompositions.? 
If the proposal had used the Grantha style glyphs in the charts for the 
proposed characters, the objections probably would have been that the 
proposed characters were already encoded in the Grantha range.

For example, IIUC, the Tamil form looks like "??" and the Grantha form 
looks like "?" for the character proposed for U+xx10, TAMIL LETTER DA.

If Shriramana Sharma's 2010 proposal (and revisions) had been accepted, 
we would not be having this discussion.? But here we are.