From kenwhistler at sonic.net  Mon Feb  1 14:54:09 2021
From: kenwhistler at sonic.net (Ken Whistler)
Date: Mon, 1 Feb 2021 12:54:09 -0800
Subject: =?UTF-8?Q?Re=3a_Origins_of_=e2=8c=9a_U+231A_WATCH_and_=e2=8c=9b_U+2?=
 =?UTF-8?Q?31B_HOURGLASS?=
In-Reply-To: <CANzej37UaR+T9j3rh0sAk27mGVp9tax4GV6ABrZU8iKf-pNVow@mail.gmail.com>
References: <CANzej34uBKVCLSh0DrUNLS_Vb3nu+59YoT-GjoRjSH-oz0s=aw@mail.gmail.com>
 <ab8bb157-0303-451a-b76f-c6e79a8005c0@disroot.org>
 <e5eb715a-5800-f0b9-ab82-e4d58830315e@sonic.net>
 <CANzej37UaR+T9j3rh0sAk27mGVp9tax4GV6ABrZU8iKf-pNVow@mail.gmail.com>
Message-ID: <484ebd87-46aa-0056-87df-e37420a1774a@sonic.net>

Marcel,

Well, having dusted off the archives, I now have the definitive answer 
as to the origin story for these two in Unicode 1.0.

In Document UTC/1991-016, dating from January, 1991, there were 19 
distinct requests for additions of characters, submitted by Layne Cannon 
on behalf of WordPerfect Corporation, for consideration at UTC #44, 
February 1, 1991. Number 13 of those requests included a request to 
encode "Clock" at U+2677 and "Hourglass" at U+2678. Those two characters 
are what ended up encoded as U+231A WATCH and U+231B HOURGLASS.

The justification for them was that they are "Part of the WordPerfect 
iconic character set". And indeed, they can be found in the attached 
listing of the "WP Symbol Set 5" at 5,31 and 5,32. They are intermixed 
there with other characters that ended up in the Zapf dingbats set in 
Unicode.

BTW, that same request from WordPerfect was also the origin of U+2319 
TURNED NOT SIGN, which was submitted in the same request as "Inverted 
begining of line" [sic].

--Ken

On 12/30/2020 6:45 PM, M. Pauluk via Unicode wrote:
> Thanks Ken! I had already checked XCCS and IBM code pages too, ? 
> U+231A WATCH and ? U+231B HOURGLASS really couldn't have originated 
> there. 

From peroyomaslists at gmail.com  Sat Feb  6 16:32:29 2021
From: peroyomaslists at gmail.com (=?UTF-8?Q?Andr=C3=A9s_Sanhueza?=)
Date: Sat, 6 Feb 2021 19:32:29 -0300
Subject: =?UTF-8?Q?Best_character_to_use_for_the_=C2=ABbolaspa=C2=BB_sign_in_Sp?=
 =?UTF-8?Q?anish?=
Message-ID: <CAPnRZcSjqp1iKmAvgz_8aeu93fSx86os6q=xRGFjybZn7Sna9w@mail.gmail.com>

The RAE (The Royal Spanish Academy of the Language, an entity from Spain
that tries to regulate the correct use of the Spanish language) uses a
punctuation sign named ?bolaspa? (an X inside a circle, like ?) to precede
examples of an incorrect expression in the language, something like, for
example:

Some people don't like to use double negatives, so saying something like
> ??we don't need no education? is seen as wrong. Using either ?we don't need
> education? or ?we need no education? is better.


The RAE also uses an asterisk (*) for the same on occasion. Unlike the
asterisk, the bolaspa is not really regular Spanish and I only remember
having seen it on dissemination texts about the language that the RAE makes
or similar. There are various similar characters on unicode and at least
one of them are intended for mathematical usage, and while I know that
*most* of the characters are encoded in relation for the shape itself
rather than the meaning or glyph, I still don't know if ANY similar looking
character will accomplish that function instinctively, so ask which do you
think is the best character to use as a punctuation sign in text.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20210206/a9a7e321/attachment.htm>

From copypaste at kittens.ph  Sat Feb  6 16:45:21 2021
From: copypaste at kittens.ph (Fredrick Brennan)
Date: Sat, 06 Feb 2021 17:45:21 -0500
Subject: Best character to use for the =?UTF-8?B?wqtib2xhc3Bhwrs=?= sign
 in Spanish
In-Reply-To: <CAPnRZcSjqp1iKmAvgz_8aeu93fSx86os6q=xRGFjybZn7Sna9w@mail.gmail.com>
References: <CAPnRZcSjqp1iKmAvgz_8aeu93fSx86os6q=xRGFjybZn7Sna9w@mail.gmail.com>
Message-ID: <1846209.PLLzzlfL6S@laptop>

Mr. Sanhuza:

Please upload an image of the /bolaspa/ in use on a printed page. I can't really try 
to answer your question without one having never seen one before.?

Best,
Fred Brennan

On Saturday, February 6, 2021 5:32:29 PM EST Andr?s Sanhueza via Unicode 
wrote:
> The RAE (The Royal Spanish Academy of the Language, an entity from Spain
> that tries to regulate the correct use of the Spanish language) uses a
> punctuation sign named ?bolaspa? (an X inside a circle, like ?) to precede
> examples of an incorrect expression in the language, something like, for
> example:
> 
> Some people don't like to use double negatives, so saying something like
> 
> > ??we don't need no education? is seen as wrong. Using either ?we don't
> > need
> > education? or ?we need no education? is better.
> 
> The RAE also uses an asterisk (*) for the same on occasion. Unlike the
> asterisk, the bolaspa is not really regular Spanish and I only remember
> having seen it on dissemination texts about the language that the RAE makes
> or similar. There are various similar characters on unicode and at least
> one of them are intended for mathematical usage, and while I know that
> *most* of the characters are encoded in relation for the shape itself
> rather than the meaning or glyph, I still don't know if ANY similar looking
> character will accomplish that function instinctively, so ask which do you
> think is the best character to use as a punctuation sign in text.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20210206/f3490e63/attachment.htm>

From jameskass at code2001.com  Sat Feb  6 16:50:51 2021
From: jameskass at code2001.com (James Kass)
Date: Sat, 6 Feb 2021 22:50:51 +0000
Subject: =?UTF-8?Q?Re=3a_Best_character_to_use_for_the_=c2=abbolaspa=c2=bb_s?=
 =?UTF-8?Q?ign_in_Spanish?=
In-Reply-To: <1846209.PLLzzlfL6S@laptop>
References: <CAPnRZcSjqp1iKmAvgz_8aeu93fSx86os6q=xRGFjybZn7Sna9w@mail.gmail.com>
 <1846209.PLLzzlfL6S@laptop>
Message-ID: <03bd6811-4d4d-8fb9-53be-0dd50065dec3@code2001.com>


On 2021-02-06 10:45 PM, Fredrick Brennan via Unicode wrote:
> Mr. Sanhuza:
>
> Please upload an image of the /bolaspa/ in use on a printed page. I can't really try
> to answer your question without one having never seen one before.?
>
> Best,
> Fred Brennan
>
Here's a couple of web page links:

https://spanish.stackexchange.com/questions/32080/tiene-nombre-el-signo-de-la-cruz-en-un-c%c3%adrculo-que-veo-a-veces-en-el-dpd/32081#32081

https://dle.rae.es/bolaspa

Both show the symbol as a superscript, above the baseline.? The first 
page cites an article which refers to the bolaspa as a symbol of 
"medieval torture".

From duerst at it.aoyama.ac.jp  Sat Feb  6 17:57:39 2021
From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J=2e_D=c3=bcrst?=)
Date: Sun, 7 Feb 2021 08:57:39 +0900
Subject: =?UTF-8?Q?Re=3a_Best_character_to_use_for_the_=c2=abbolaspa=c2=bb_s?=
 =?UTF-8?Q?ign_in_Spanish?=
In-Reply-To: <03bd6811-4d4d-8fb9-53be-0dd50065dec3@code2001.com>
References: <CAPnRZcSjqp1iKmAvgz_8aeu93fSx86os6q=xRGFjybZn7Sna9w@mail.gmail.com>
 <1846209.PLLzzlfL6S@laptop>
 <03bd6811-4d4d-8fb9-53be-0dd50065dec3@code2001.com>
Message-ID: <73306699-7972-931b-462d-d19f4c871e70@it.aoyama.ac.jp>

On 07/02/2021 07:50, James Kass via Unicode wrote:

> Here's a couple of web page links:
> 
> https://spanish.stackexchange.com/questions/32080/tiene-nombre-el-signo-de-la-cruz-en-un-c%c3%adrculo-que-veo-a-veces-en-el-dpd/32081#32081 
> 
> 
> https://dle.rae.es/bolaspa
> 
> Both show the symbol as a superscript, above the baseline.? The first 
> page cites an article which refers to the bolaspa as a symbol of 
> "medieval torture".

And it's easy to figure out that they use U+2297 CIRCLED TIMES for this 
purpose, and seem to be happy with it.

The entry for U+2297 also gives various alternatives:

2297 ? CIRCLED TIMES
= tensor product
= vector pointing into page
? 26D2 ?  circled crossing lanes
? 2A02 ?  n-ary circled times operator
? 2BBE ?  circled x
? 2297 FE00 ?  with white rim

Not sure we need yet another character that looks almost the same. But 
maybe adding a comment such as
= Spanish bolaspa
might help.

Regards,   Martin.

From doug at ewellic.org  Sat Feb  6 18:05:32 2021
From: doug at ewellic.org (Doug Ewell)
Date: Sat, 6 Feb 2021 17:05:32 -0700
Subject: No more RGI flag sequences
Message-ID: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org>

I spotted this recently, by chance, in  L2/21-014, "Emoji Subcommittee Report Q1, 2021":

"RGI Flag Criteria Update: Moving forward, proposals to add subdivision flags or continental regions as RGI will not be considered by the Emoji Subcommittee or the UTC."

That means neither Northern Ireland, nor the states of the U.S. or Mexico or Germany, nor the provinces of Canada, nor the regions of Italy?many of whose inhabitants have as much pride in their homeland as the English, Scottish, and Welsh?will have their flag sequences tagged as RGI. (Continental regions, of course, do not have flags.)

That basically means no vendor will support flag images for these places, and they will not be interchangeable in any medium that uses Unicode.

Some of us find this unfortunate.

--
Doug Ewell, CC, ALB | Thornton, CO, US ?????? | ewellic.org


From copypaste at kittens.ph  Sat Feb  6 18:14:38 2021
From: copypaste at kittens.ph (Fredrick Brennan)
Date: Sat, 06 Feb 2021 19:14:38 -0500
Subject: Best character to use for the =?UTF-8?B?wqtib2xhc3Bhwrs=?= sign
 in Spanish
In-Reply-To: <73306699-7972-931b-462d-d19f4c871e70@it.aoyama.ac.jp>
References: <CAPnRZcSjqp1iKmAvgz_8aeu93fSx86os6q=xRGFjybZn7Sna9w@mail.gmail.com>
 <03bd6811-4d4d-8fb9-53be-0dd50065dec3@code2001.com>
 <73306699-7972-931b-462d-d19f4c871e70@it.aoyama.ac.jp>
Message-ID: <3432254.lxTiPNA053@laptop>

To argue the other side:

The examples given seem to suggest that bolaspa, as opposed to ?, is always 
shown in superscript, in Unicode chart terms, something like:

? <super> 2297 ?

So, if this is indeed done consistently, perhaps that can be a basis for an 
argument in a proposal.

Best,
Fred Brennan

On Saturday, February 6, 2021 6:57:39 PM EST Martin J. D?rst via Unicode 
wrote:
> [I]t's easy to figure out that they use U+2297 CIRCLED TIMES for this
> purpose, and seem to be happy with it.
> 
> Not sure we need yet another character that looks almost the same. But
> maybe adding a comment such as
> = Spanish bolaspa
> might help.
> 
> Regards,   Martin.


From mark at macchiato.com  Sat Feb  6 19:16:00 2021
From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=)
Date: Sat, 6 Feb 2021 17:16:00 -0800
Subject: No more RGI flag sequences
In-Reply-To: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org>
References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org>
Message-ID: <CAJ2xs_HkTkH70RC-454-QXcy+phCXQsiq4qktooBqm7WjChrKw@mail.gmail.com>

The reasoning behind that has been at
https://www.unicode.org/emoji/proposals.html#Flags (F2) for some time. The
feasibility issues behind that reasoning would have to change substantially
before FR could be revised.

Note however that all the subdivision flags remain valid; just not
recommended for general interchange.

Mark


On Sat, Feb 6, 2021 at 4:07 PM Doug Ewell via Unicode <unicode at unicode.org>
wrote:

> I spotted this recently, by chance, in  L2/21-014, "Emoji Subcommittee
> Report Q1, 2021":
>
> "RGI Flag Criteria Update: Moving forward, proposals to add subdivision
> flags or continental regions as RGI will not be considered by the Emoji
> Subcommittee or the UTC."
>
> That means neither Northern Ireland, nor the states of the U.S. or Mexico
> or Germany, nor the provinces of Canada, nor the regions of Italy?many of
> whose inhabitants have as much pride in their homeland as the English,
> Scottish, and Welsh?will have their flag sequences tagged as RGI.
> (Continental regions, of course, do not have flags.)
>
> That basically means no vendor will support flag images for these places,
> and they will not be interchangeable in any medium that uses Unicode.
>
> Some of us find this unfortunate.
>
> --
> Doug Ewell, CC, ALB | Thornton, CO, US ?????? | ewellic.org
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20210206/6e731fce/attachment.htm>

From christoph.paeper at crissov.de  Sun Feb  7 08:11:37 2021
From: christoph.paeper at crissov.de (=?utf-8?Q?Christoph_P=C3=A4per?=)
Date: Sun, 7 Feb 2021 15:11:37 +0100
Subject: No more RGI flag sequences
In-Reply-To: <CAJ2xs_HkTkH70RC-454-QXcy+phCXQsiq4qktooBqm7WjChrKw@mail.gmail.com>
References: <CAJ2xs_HkTkH70RC-454-QXcy+phCXQsiq4qktooBqm7WjChrKw@mail.gmail.com>
Message-ID: <D2F23BEB-E707-43CB-9637-EEF5D268075C@crissov.de>

Mark Davis ?? via Unicode:
> 
> The reasoning behind that has been at https://www.unicode.org/emoji/proposals.html#Flags (F2) for some time. The feasibility issues behind that reasoning would have to change substantially before FR could be revised. 

The only ?continental region? with a well -established, copyleft flag is 002, Africa: ????

> Note however that all the subdivision flags remain valid; just not recommended for general interchange.

?just not? means everything here, chicken and egg. We should un-RGI emoji flags for dependent regions (like UM which often uses the same graphic as US). 

In conclusion, Whatsapp was right to use private user codes from ISO 3166-1, like XE for England, employing RIS, because there never was a realistic chance that we would get out of two-letter codes any time soon. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20210207/59aad691/attachment.htm>

From otto.stolz at uni-konstanz.de  Sun Feb  7 08:59:34 2021
From: otto.stolz at uni-konstanz.de (Otto Stolz)
Date: Sun, 7 Feb 2021 15:59:34 +0100
Subject: No more RGI flag sequences
In-Reply-To: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org>
References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org>
Message-ID: <58870b04-dc34-0b97-0428-366189626bdd@uni-konstanz.de>

Hello,

what, on earth (and in Unicode jargon), does ?RGI? mean?

The obvious place to look for Unicode specific abbreviations
is <https://www.unicode.org/faq/alpha_soup.html>, but I cannot
find it there. Experts, please amend the FAQ with all common
Unicode acronyms.

Best wishes,
 ? Otto Stolz

From arthur at reutenauer.eu  Sun Feb  7 09:29:50 2021
From: arthur at reutenauer.eu (Arthur Reutenauer)
Date: Sun, 7 Feb 2021 16:29:50 +0100
Subject: No more RGI flag sequences
In-Reply-To: <58870b04-dc34-0b97-0428-366189626bdd@uni-konstanz.de>
References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org>
 <58870b04-dc34-0b97-0428-366189626bdd@uni-konstanz.de>
Message-ID: <20210207152950.xksmihrlbmpmtruk@phare.normalesup.org>

On Sun, Feb 07, 2021 at 03:59:34PM +0100, Otto Stolz via Unicode wrote:
> what, on earth (and in Unicode jargon), does ?RGI? mean?

  ?Recommended for general interchange?.  See https://www.unicode.org/reports/tr51/#def_RGI

	Arthur

From harjitmoe at outlook.com  Sun Feb  7 10:15:02 2021
From: harjitmoe at outlook.com (Harriet Riddle)
Date: Sun, 7 Feb 2021 16:15:02 +0000
Subject: Fwd: No more RGI flag sequences
In-Reply-To: <VI1PR07MB5712865DB7BA5C87BFAC5C76B7B09@VI1PR07MB5712.eurprd07.prod.outlook.com>
References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org>,
 <58870b04-dc34-0b97-0428-366189626bdd@uni-konstanz.de>,
 <VI1PR07MB5712865DB7BA5C87BFAC5C76B7B09@VI1PR07MB5712.eurprd07.prod.outlook.com>
Message-ID: <VI1PR07MB5712902F814CB505C2CFA1D5B7B09@VI1PR07MB5712.eurprd07.prod.outlook.com>

Since I inadvertantly sent the below as Reply, as opposed to Reply All, I'm forwarding it to the list for record.

?Har.

________________________________
From: Harriet Riddle <harjitmoe at outlook.com>
Sent: Sunday, 7 February 2021, 15:14
To: Otto Stolz
Subject: Re: No more RGI flag sequences

RGI means Recommended for General Interchange, i.e. an grapheme cluster that emoji fonts are expected to include a glyph for as standard, as opposed to other sequences with U+200D or U+FE0F (or even codepoints without emoji status which are not marked with U+FE0F, e.g. on Samsung devices) which specific emoji fonts might decide to include.

?Har.

Get Outlook for Android<https://aka.ms/ghei36>
________________________________
From: Unicode <unicode-bounces at unicode.org> on behalf of Otto Stolz via Unicode <unicode at unicode.org>
Sent: Sunday, February 7, 2021 2:59:34 PM
To: unicode at unicode.org <unicode at unicode.org>
Subject: Re: No more RGI flag sequences

Hello,

what, on earth (and in Unicode jargon), does ?RGI? mean?

The obvious place to look for Unicode specific abbreviations
is <https://www.unicode.org/faq/alpha_soup.html>, but I cannot
find it there. Experts, please amend the FAQ with all common
Unicode acronyms.

Best wishes,
   Otto Stolz

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20210207/4bf0891b/attachment.htm>

From doug at ewellic.org  Sun Feb  7 12:29:51 2021
From: doug at ewellic.org (Doug Ewell)
Date: Sun, 7 Feb 2021 11:29:51 -0700
Subject: No more RGI flag sequences
In-Reply-To: <CAJ2xs_HkTkH70RC-454-QXcy+phCXQsiq4qktooBqm7WjChrKw@mail.gmail.com>
References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org>
 <CAJ2xs_HkTkH70RC-454-QXcy+phCXQsiq4qktooBqm7WjChrKw@mail.gmail.com>
Message-ID: <003601d6fd7f$39614c00$ac23e400$@ewellic.org>

Mark Davis wrote:

> The reasoning behind that has been at
> https://www.unicode.org/emoji/proposals.html#Flags (F2) for some time.

Since I'm unlikely ever to submit a true emoji proposal (i.e. for a facial expression or animal or hand gesture), I probably wouldn't have thought to look at the "Submitting Emoji Proposals" page.

That said, this passage in that section:

> Adding further subdivision flags as RGI can also appear to play
> favorites unless similar subdivisions also get flags, which could mean
> ?all other flags of that country? or ?all subdivisions of greater or
> equal population in other countries?

doesn't seem to align with the decision to exclude Northern Ireland.

> The feasibility issues behind that reasoning would have to change
> substantially before FR could be revised. 
>
> Note however that all the subdivision flags remain valid; just not
> recommended for general interchange.

That basically means no vendor will support flag images for these places, and they will not be interchangeable in any medium that uses Unicode.

--
Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org


From joan at montane.cat  Sun Feb  7 13:10:26 2021
From: joan at montane.cat (=?UTF-8?Q?Joan_Montan=C3=A9?=)
Date: Sun, 7 Feb 2021 20:10:26 +0100
Subject: No more RGI flag sequences
In-Reply-To: <003601d6fd7f$39614c00$ac23e400$@ewellic.org>
References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org>
 <CAJ2xs_HkTkH70RC-454-QXcy+phCXQsiq4qktooBqm7WjChrKw@mail.gmail.com>
 <003601d6fd7f$39614c00$ac23e400$@ewellic.org>
Message-ID: <CAKaaSX9unrd3Oh4BjDAPTUghq5baXYh3EryaMVwUGV0ht09wVA@mail.gmail.com>

Missatge de Doug Ewell via Unicode <unicode at unicode.org> del dia dg.,
7 de febr. 2021 a les 19:34:
>
> Mark Davis wrote:
>
> > The reasoning behind that has been at
> > https://www.unicode.org/emoji/proposals.html#Flags (F2) for some time.
>
> Since I'm unlikely ever to submit a true emoji proposal (i.e. for a facial expression or animal or hand gesture), I probably wouldn't have thought to look at the "Submitting Emoji Proposals" page.
>
> That said, this passage in that section:
>
> > Adding further subdivision flags as RGI can also appear to play
> > favorites unless similar subdivisions also get flags, which could mean
> > ?all other flags of that country? or ?all subdivisions of greater or
> > equal population in other countries?
>
> doesn't seem to align with the decision to exclude Northern Ireland.
>
> > The feasibility issues behind that reasoning would have to change
> > substantially before FR could be revised.
> >
> > Note however that all the subdivision flags remain valid; just not
> > recommended for general interchange.
>
> That basically means no vendor will support flag images for these places, and they will not be interchangeable in any medium that uses Unicode.
>

So, Unicode creates a universal encoding mechanism to represent flags
from subdivision ISO territories years ago, and Unicode throws the key
to the bottom of the sea now.

I can understand that it is hard to put a line for which subdivision
territories merit RGI. But closing RGI to UK is really
English-focused.

Just my 2 ct.

Joan Montan?


From mark at macchiato.com  Sun Feb  7 13:33:08 2021
From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=)
Date: Sun, 7 Feb 2021 11:33:08 -0800
Subject: No more RGI flag sequences
In-Reply-To: <003601d6fd7f$39614c00$ac23e400$@ewellic.org>
References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org>
 <CAJ2xs_HkTkH70RC-454-QXcy+phCXQsiq4qktooBqm7WjChrKw@mail.gmail.com>
 <003601d6fd7f$39614c00$ac23e400$@ewellic.org>
Message-ID: <CAJ2xs_ErpGHnohm2P0yG47Zddmn57wP6wTX2XUhXziHAhE87zA@mail.gmail.com>

The main issue for making N. Ireland be RGI was the lack of an
official flag. It is valid (???????).

Mark


On Sun, Feb 7, 2021 at 10:29 AM Doug Ewell <doug at ewellic.org> wrote:

> Mark Davis wrote:
>
> > The reasoning behind that has been at
> > https://www.unicode.org/emoji/proposals.html#Flags (F2) for some time.
>
> Since I'm unlikely ever to submit a true emoji proposal (i.e. for a facial
> expression or animal or hand gesture), I probably wouldn't have thought to
> look at the "Submitting Emoji Proposals" page.
>
> That said, this passage in that section:
>
> > Adding further subdivision flags as RGI can also appear to play
> > favorites unless similar subdivisions also get flags, which could mean
> > ?all other flags of that country? or ?all subdivisions of greater or
> > equal population in other countries?
>
> doesn't seem to align with the decision to exclude Northern Ireland.
>
> > The feasibility issues behind that reasoning would have to change
> > substantially before FR could be revised.
> >
> > Note however that all the subdivision flags remain valid; just not
> > recommended for general interchange.
>
> That basically means no vendor will support flag images for these places,
> and they will not be interchangeable in any medium that uses Unicode.
>
> --
> Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20210207/79c1cb05/attachment.htm>

From mark at macchiato.com  Sun Feb  7 13:58:23 2021
From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=)
Date: Sun, 7 Feb 2021 11:58:23 -0800
Subject: No more RGI flag sequences
In-Reply-To: <CAKaaSX9unrd3Oh4BjDAPTUghq5baXYh3EryaMVwUGV0ht09wVA@mail.gmail.com>
References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org>
 <CAJ2xs_HkTkH70RC-454-QXcy+phCXQsiq4qktooBqm7WjChrKw@mail.gmail.com>
 <003601d6fd7f$39614c00$ac23e400$@ewellic.org>
 <CAKaaSX9unrd3Oh4BjDAPTUghq5baXYh3EryaMVwUGV0ht09wVA@mail.gmail.com>
Message-ID: <CAJ2xs_HK8t1yMiSvehCcmioKwfAF+Q-bvchDWAM1at_uA7zq7Q@mail.gmail.com>

Part of the reason for providing a general mechanism that would make
all subdivision flags be valid was to provide an interchangeable way for
some platforms to supply additional subdivision flags. Then evidence of
popularity on those platforms could provide a strong signal for making a
particular subdivision flag be RGI. As it turned out,
(a) the frequency of usage of subdivision flags turned out to be quite low.
(The category of flags in general is already not stellar:
https://home.unicode.org/emoji/emoji-frequency/)
(b) adding more subdivision flags turned out to be a long, slippery slope,
and full of geopolitical landmines.

Mark


On Sun, Feb 7, 2021 at 11:10 AM Joan Montan? <joan at montane.cat> wrote:

> Missatge de Doug Ewell via Unicode <unicode at unicode.org> del dia dg.,
> 7 de febr. 2021 a les 19:34:
> >
> > Mark Davis wrote:
> >
> > > The reasoning behind that has been at
> > > https://www.unicode.org/emoji/proposals.html#Flags (F2) for some time.
> >
> > Since I'm unlikely ever to submit a true emoji proposal (i.e. for a
> facial expression or animal or hand gesture), I probably wouldn't have
> thought to look at the "Submitting Emoji Proposals" page.
> >
> > That said, this passage in that section:
> >
> > > Adding further subdivision flags as RGI can also appear to play
> > > favorites unless similar subdivisions also get flags, which could mean
> > > ?all other flags of that country? or ?all subdivisions of greater or
> > > equal population in other countries?
> >
> > doesn't seem to align with the decision to exclude Northern Ireland.
> >
> > > The feasibility issues behind that reasoning would have to change
> > > substantially before FR could be revised.
> > >
> > > Note however that all the subdivision flags remain valid; just not
> > > recommended for general interchange.
> >
> > That basically means no vendor will support flag images for these
> places, and they will not be interchangeable in any medium that uses
> Unicode.
> >
>
> So, Unicode creates a universal encoding mechanism to represent flags
> from subdivision ISO territories years ago, and Unicode throws the key
> to the bottom of the sea now.
>
> I can understand that it is hard to put a line for which subdivision
> territories merit RGI. But closing RGI to UK is really
> English-focused.
>
> Just my 2 ct.
>
> Joan Montan?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20210207/b3fad099/attachment.htm>

From everson at evertype.com  Sun Feb  7 14:19:19 2021
From: everson at evertype.com (Michael Everson)
Date: Sun, 7 Feb 2021 20:19:19 +0000
Subject: No more RGI flag sequences
In-Reply-To: <CAJ2xs_ErpGHnohm2P0yG47Zddmn57wP6wTX2XUhXziHAhE87zA@mail.gmail.com>
References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org>
 <CAJ2xs_HkTkH70RC-454-QXcy+phCXQsiq4qktooBqm7WjChrKw@mail.gmail.com>
 <003601d6fd7f$39614c00$ac23e400$@ewellic.org>
 <CAJ2xs_ErpGHnohm2P0yG47Zddmn57wP6wTX2XUhXziHAhE87zA@mail.gmail.com>
Message-ID: <F28F0943-CC44-4998-B9E1-74B09F28848A@evertype.com>

On 7 Feb 2021, at 19:33, Mark Davis ?? via Unicode <unicode at unicode.org> wrote:
> 
> The main issue for making N. Ireland be RGI was the lack of an official flag. It is valid (???????).

This is the mistake that the Consortium made. There is a flag which is widely and publicly in use. What is ?official? or ?unofficial? about it is a decision taken without due consideration to the realities of the political settlement in Britain and Ireland.

Who uses flags and why? Nationalists in the North may prefer to use the Irish tricolour ??. Unionists may wish to use the Union flag ??. Who cares? That?s for people who want to refer to a national flag. The fact of the matter is that the United Kingdom is composed of three countries and one province. And in reality, FOUR flags are used particularly in sport. 

"The Ulster Banner was carried by the Northern Ireland team in the Commonwealth Games. It is also regularly displayed by supporters of the Northern Ireland national football team and is displayed by FIFA as the flag of Northern Ireland.? https://en.wikipedia.org/wiki/Flag_of_Northern_Ireland

The decision to refuse to include the Ulster Banner for Northern Ireland was a really dumb decision. No good was served by it. Instead of using common sense, ?the lack of an official flag? was used as an excuse. It doesn?t make the Consortium look good. 

Michael Everson


From harjitmoe at outlook.com  Sun Feb  7 15:11:04 2021
From: harjitmoe at outlook.com (Harriet Riddle)
Date: Sun, 7 Feb 2021 21:11:04 +0000
Subject: No more RGI flag sequences
In-Reply-To: <F28F0943-CC44-4998-B9E1-74B09F28848A@evertype.com>
References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org>
 <CAJ2xs_HkTkH70RC-454-QXcy+phCXQsiq4qktooBqm7WjChrKw@mail.gmail.com>
 <003601d6fd7f$39614c00$ac23e400$@ewellic.org>
 <CAJ2xs_ErpGHnohm2P0yG47Zddmn57wP6wTX2XUhXziHAhE87zA@mail.gmail.com>,
 <F28F0943-CC44-4998-B9E1-74B09F28848A@evertype.com>
Message-ID: <VI1PR07MB57125E1838269A62358C0FB1B7B09@VI1PR07MB5712.eurprd07.prod.outlook.com>

Some rambling targeted at no?ne in particular?

What should be RGI for flags is a bit confusing, even when subregions are not considered.

For instance, UM is a country code for the United States Minor Outlying Islands. This has no permanent population and, as such, no flag (official or not) besides that of the United States. Hence, the inclusion of it is a largely pointless duplicate encoding of the flag of the United States. However, it is widely supported across vendors.

Meanwhile, the subregional code iqar corresponds to Erbil Governorate, Erbil being the capital of Iraq's autonomous Kurdistan Region. If a flag emoji encoding can show the flag of a larger region in absence of a more specific flag (like with the UM example), then I'd deduce that the subregional code iqar may be a perfectly reasonable encoding for the Kurdish flag.

So does Unicode *really* exclude the Kurdish flag, as some who would kick up a stink might claim? There is no clean yes or no answer, much as there is no clean answer for Northern Ireland. The code is valid, but if it's not RGI, will any vendor try to support it? given that besides some legacy kept around by Samsung, what's RGI might tend to determine what new emoji "exist"?

All of that being said, I doubt vendors would want to *remove* the flag of e.g. Scotland, though, since that would send a message in itself.

? Har.
________________________________
From: Unicode <unicode-bounces at unicode.org> on behalf of Michael Everson via Unicode <unicode at unicode.org>
Sent: Sunday, February 7, 2021 8:19:19 PM
To: Unicode@ <unicode at unicode.org>
Subject: Re: No more RGI flag sequences

On 7 Feb 2021, at 19:33, Mark Davis ?? via Unicode <unicode at unicode.org> wrote:
>
> The main issue for making N. Ireland be RGI was the lack of an official flag. It is valid (???????).

This is the mistake that the Consortium made. There is a flag which is widely and publicly in use. What is ?official? or ?unofficial? about it is a decision taken without due consideration to the realities of the political settlement in Britain and Ireland.

Who uses flags and why? Nationalists in the North may prefer to use the Irish tricolour ??. Unionists may wish to use the Union flag ??. Who cares? That?s for people who want to refer to a national flag. The fact of the matter is that the United Kingdom is composed of three countries and one province. And in reality, FOUR flags are used particularly in sport.

"The Ulster Banner was carried by the Northern Ireland team in the Commonwealth Games. It is also regularly displayed by supporters of the Northern Ireland national football team and is displayed by FIFA as the flag of Northern Ireland.? https://en.wikipedia.org/wiki/Flag_of_Northern_Ireland

The decision to refuse to include the Ulster Banner for Northern Ireland was a really dumb decision. No good was served by it. Instead of using common sense, ?the lack of an official flag? was used as an excuse. It doesn?t make the Consortium look good.

Michael Everson


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20210207/f1729168/attachment.htm>

From mark at macchiato.com  Sun Feb  7 16:53:00 2021
From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=)
Date: Sun, 7 Feb 2021 14:53:00 -0800
Subject: No more RGI flag sequences
In-Reply-To: <VI1PR07MB57125E1838269A62358C0FB1B7B09@VI1PR07MB5712.eurprd07.prod.outlook.com>
References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org>
 <CAJ2xs_HkTkH70RC-454-QXcy+phCXQsiq4qktooBqm7WjChrKw@mail.gmail.com>
 <003601d6fd7f$39614c00$ac23e400$@ewellic.org>
 <CAJ2xs_ErpGHnohm2P0yG47Zddmn57wP6wTX2XUhXziHAhE87zA@mail.gmail.com>
 <F28F0943-CC44-4998-B9E1-74B09F28848A@evertype.com>
 <VI1PR07MB57125E1838269A62358C0FB1B7B09@VI1PR07MB5712.eurprd07.prod.outlook.com>
Message-ID: <CAJ2xs_F1=Ke-DLfwzLfZK=7EM_L8HOfTi1OpgM5Lr5wcWi0cvQ@mail.gmail.com>

Part of this whole story is simply history. The flags were encoded well
before we had developed the definition of RGI and of subdivision flags
(2017, https://www.unicode.org/reports/tr51/tr51-12.html).

On Sun, Feb 7, 2021 at 1:13 PM Harriet Riddle via Unicode <
unicode at unicode.org> wrote:

> Some rambling targeted at no?ne in particular?
>
> What should be RGI for flags is a bit confusing, even when subregions are
> not considered.
>
> For instance, UM is a country code for the United States Minor Outlying
> Islands. This has no permanent population and, as such, no flag (official
> or not) besides that of the United States. Hence, the inclusion of it is a
> largely pointless duplicate encoding of the flag of the United States.
> However, it is widely supported across vendors.
>
> Meanwhile, the subregional code iqar corresponds to Erbil Governorate,
> Erbil being the capital of Iraq's autonomous Kurdistan Region. If a flag
> emoji encoding can show the flag of a larger region in absence of a more
> specific flag (like with the UM example), then I'd deduce that the
> subregional code iqar may be a perfectly reasonable encoding for the
> Kurdish flag.
>
> So does Unicode *really* exclude the Kurdish flag, as some who would kick
> up a stink might claim? There is no clean yes or no answer, much as there
> is no clean answer for Northern Ireland. The code is valid, but if it's not
> RGI, will any vendor try to support it? given that besides some legacy kept
> around by Samsung, what's RGI might tend to determine what new emoji
> "exist"?
>
> All of that being said, I doubt vendors would want to *remove* the flag of
> e.g. Scotland, though, since that would send a message in itself.
>
> ? Har.
> ------------------------------
> *From:* Unicode <unicode-bounces at unicode.org> on behalf of Michael
> Everson via Unicode <unicode at unicode.org>
> *Sent:* Sunday, February 7, 2021 8:19:19 PM
> *To:* Unicode@ <unicode at unicode.org>
> *Subject:* Re: No more RGI flag sequences
>
> On 7 Feb 2021, at 19:33, Mark Davis ?? via Unicode <unicode at unicode.org>
> wrote:
> >
> > The main issue for making N. Ireland be RGI was the lack of an official
> flag. It is valid (???????).
>
> This is the mistake that the Consortium made. There is a flag which is
> widely and publicly in use. What is ?official? or ?unofficial? about it is
> a decision taken without due consideration to the realities of the
> political settlement in Britain and Ireland.
>
> Who uses flags and why? Nationalists in the North may prefer to use the
> Irish tricolour ??. Unionists may wish to use the Union flag ??. Who
> cares? That?s for people who want to refer to a national flag. The fact of
> the matter is that the United Kingdom is composed of three countries and
> one province. And in reality, FOUR flags are used particularly in sport.
>
> "The Ulster Banner was carried by the Northern Ireland team in the
> Commonwealth Games. It is also regularly displayed by supporters of the
> Northern Ireland national football team and is displayed by FIFA as the
> flag of Northern Ireland.?
> https://en.wikipedia.org/wiki/Flag_of_Northern_Ireland
>
> The decision to refuse to include the Ulster Banner for Northern Ireland
> was a really dumb decision. No good was served by it. Instead of using
> common sense, ?the lack of an official flag? was used as an excuse. It
> doesn?t make the Consortium look good.
>
> Michael Everson
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20210207/27930a8b/attachment.htm>

From everson at evertype.com  Sun Feb  7 17:43:12 2021
From: everson at evertype.com (Michael Everson)
Date: Sun, 7 Feb 2021 23:43:12 +0000
Subject: No more RGI flag sequences
In-Reply-To: <CAJ2xs_F1=Ke-DLfwzLfZK=7EM_L8HOfTi1OpgM5Lr5wcWi0cvQ@mail.gmail.com>
References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org>
 <CAJ2xs_HkTkH70RC-454-QXcy+phCXQsiq4qktooBqm7WjChrKw@mail.gmail.com>
 <003601d6fd7f$39614c00$ac23e400$@ewellic.org>
 <CAJ2xs_ErpGHnohm2P0yG47Zddmn57wP6wTX2XUhXziHAhE87zA@mail.gmail.com>
 <F28F0943-CC44-4998-B9E1-74B09F28848A@evertype.com>
 <VI1PR07MB57125E1838269A62358C0FB1B7B09@VI1PR07MB5712.eurprd07.prod.outlook.com>
 <CAJ2xs_F1=Ke-DLfwzLfZK=7EM_L8HOfTi1OpgM5Lr5wcWi0cvQ@mail.gmail.com>
Message-ID: <565B759A-2DD8-4091-9A51-EEB328439BFD@evertype.com>

The clean answer is ?use the de-facto Ulster Banner glyph until such time as an ?official? flag is adopted?. Instead we have one of the constituent parts of the UK treated differently from the others, which is not very satisfactory. 

> On 7 Feb 2021, at 22:53, Mark Davis ?? via Unicode <unicode at unicode.org> wrote:
> 
> There is no clean yes or no answer, much as there is no clean answer for Northern Ireland. 


From mark at macchiato.com  Sun Feb  7 19:59:41 2021
From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=)
Date: Sun, 7 Feb 2021 17:59:41 -0800
Subject: No more RGI flag sequences
In-Reply-To: <565B759A-2DD8-4091-9A51-EEB328439BFD@evertype.com>
References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org>
 <CAJ2xs_HkTkH70RC-454-QXcy+phCXQsiq4qktooBqm7WjChrKw@mail.gmail.com>
 <003601d6fd7f$39614c00$ac23e400$@ewellic.org>
 <CAJ2xs_ErpGHnohm2P0yG47Zddmn57wP6wTX2XUhXziHAhE87zA@mail.gmail.com>
 <F28F0943-CC44-4998-B9E1-74B09F28848A@evertype.com>
 <VI1PR07MB57125E1838269A62358C0FB1B7B09@VI1PR07MB5712.eurprd07.prod.outlook.com>
 <CAJ2xs_F1=Ke-DLfwzLfZK=7EM_L8HOfTi1OpgM5Lr5wcWi0cvQ@mail.gmail.com>
 <565B759A-2DD8-4091-9A51-EEB328439BFD@evertype.com>
Message-ID: <CAJ2xs_Gq1OZd93qaXGX_nonNG3zOdVvdi==XD+zWv4USR5cS0g@mail.gmail.com>

Flags of Northern Ireland have a complicated history. Given that
the political parties in Northern Ireland have considered the issue and
were unable to come to a conclusion, there is no agreement in Unicode that
it should be RGI.

It is a valid Emoji, so if any party wants to supply a font with a glyph
design of their choice, they are free to do so. Just as any party can
supply a font with Phaistos disk symbols or other Unicode characters.

Mark


On Sun, Feb 7, 2021 at 3:44 PM Michael Everson via Unicode <
unicode at unicode.org> wrote:

> The clean answer is ?use the de-facto Ulster Banner glyph until such time
> as an ?official? flag is adopted?. Instead we have one of the constituent
> parts of the UK treated differently from the others, which is not very
> satisfactory.
>
> > On 7 Feb 2021, at 22:53, Mark Davis ?? via Unicode <unicode at unicode.org>
> wrote:
> >
> > There is no clean yes or no answer, much as there is no clean answer for
> Northern Ireland.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20210207/7090b7a2/attachment.htm>

From christoph.paeper at crissov.de  Mon Feb  8 02:34:36 2021
From: christoph.paeper at crissov.de (=?utf-8?Q?Christoph_P=C3=A4per?=)
Date: Mon, 8 Feb 2021 09:34:36 +0100
Subject: No more RGI flag sequences
In-Reply-To: <CAJ2xs_Gq1OZd93qaXGX_nonNG3zOdVvdi==XD+zWv4USR5cS0g@mail.gmail.com>
References: <CAJ2xs_Gq1OZd93qaXGX_nonNG3zOdVvdi==XD+zWv4USR5cS0g@mail.gmail.com>
Message-ID: <C08A0C48-4D36-425D-92DB-6B64A73F9AB2@crissov.de>

Mark Davis ?? via Unicode:
> 
> Flags of Northern Ireland have a complicated history. Given that the political parties in Northern Ireland have considered the issue and were unable to come to a conclusion, there is no agreement in Unicode that it should be RGI.

For all other (sub-)national flag emojis, Unicode does not suggest a particular design, not raus its existence for the RGI label. Vendors would be free to display the one of TW the same as CN, for instance, but they decided to not show it at all in devices for the mainland Chinese market. 

There are really just two reasonable options:

1. Treat GBNIR the same as GBENG, GBSCT and GBWLS, i.e. either recommend its flag emoji for general interchange or deprecate them all. 
2. Unrecommend all RIS emoji flags that similarly have no clearly defined design distinct from their parent region, e.g. UM or simply all that are marked as ?dependent? (which is equivalent to ?not independent?) in ISO 3166. 

> It is a valid Emoji, so if any party wants to supply a font with a glyph design of their choice, they are free to do so. Just as any party can supply a font with Phaistos disk symbols or other Unicode characters.

That is a non-sequitur comparison. It?s more like all font vendors only supporting combining diacritics for roman letters if the combination also exists as a precomposed character ? and maintainers of open source fonts rejecting any contributions for other combinations, stating the fact that Unicode does not actively require support for them as the reason. (See several issues and PRs in the Github repositories of Noto Color Emoji and Twemoji.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20210208/18a67e26/attachment.htm>

From wjgo_10009 at btinternet.com  Mon Feb  8 04:19:53 2021
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Mon, 8 Feb 2021 10:19:53 +0000 (GMT)
Subject: Stickers
Message-ID: <1fbb0ea4.e9.1778127a034.Webtop.229@btinternet.com>

In https://www.unicode.org/emoji/proposals.html there is the following, 
near the start.

> For proposals that may not have all the information required we 
> encourage you to use other mechanisms such as stickers, gifs, etc. to 
> share with the world.

What exactly is a sticker please?

For example, if someone produces and publishes an OpenType font with a 
colourful glyph that is not mapped to a Unicode code point and the 
publisher declares that the glyph can become displayed by an application 
program by entering the sequence %9217 whereupon glyph substitution will 
take place, substituting the colourful glyph for the five glyphs of the 
sequence, provided that the application program has the ability to act 
upon the liga table that is in the font and ligature substitution is 
switched on, is  that a sticker in Unicode parlance? Or is it something 
else, and if so, what is it please?

If people start using such sequences for glyphs then the result could be 
as potentially ambiguous as using Private Use Area encodings.

However, I remember the way that new groups were added to the Usenet alt 
hierarchy, using a process of making a proposal in the alt.config group 
and discussion taking place for around a week and then starting of the 
new group then usually proceeding. This avoided name clashes, helped 
structure and was a generally helpful process. So these days, a mailing 
list or a wiki could be used for an informal, non-obligatory, helpful 
forum for such folk encoding so as to try to avoid duplication of 
sequences and possibly to try to keep some sort of structure.

Encoding of glyphs in regular Unicode is good, yet for glyphs that do 
not get encoded this could be a useful technique.

William Overington

Monday 8 February 2021

From doug at ewellic.org  Mon Feb  8 12:13:45 2021
From: doug at ewellic.org (Doug Ewell)
Date: Mon, 8 Feb 2021 11:13:45 -0700
Subject: No more RGI flag sequences
In-Reply-To: <C08A0C48-4D36-425D-92DB-6B64A73F9AB2@crissov.de>
References: <CAJ2xs_Gq1OZd93qaXGX_nonNG3zOdVvdi==XD+zWv4USR5cS0g@mail.gmail.com>
 <C08A0C48-4D36-425D-92DB-6B64A73F9AB2@crissov.de>
Message-ID: <000001d6fe46$24117ce0$6c3476a0$@ewellic.org>

Christoph P?per wrote:

> For all other (sub-)national flag emojis, Unicode does not suggest a
> particular design, not raus its existence for the RGI label. Vendors
> would be free to display the one of TW the same as CN, for instance,
> but they decided to not show it at all in devices for the mainland
> Chinese market.

I agree with Christoph here. Having two (or more) flag designs to choose from when rendering a flag image is a matter of glyph design, just like having single-story and double-story glyph variants of the letters 'a' and 'g'. And no vendor or font designer is ever required to include glyphs for every Unicode character or sequence, "recommended" or not.

 --
Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org


From kent.b.karlsson at bahnhof.se  Mon Feb  8 14:48:24 2021
From: kent.b.karlsson at bahnhof.se (Kent Karlsson)
Date: Mon, 8 Feb 2021 21:48:24 +0100
Subject: No more RGI flag sequences
In-Reply-To: <000001d6fe46$24117ce0$6c3476a0$@ewellic.org>
References: <CAJ2xs_Gq1OZd93qaXGX_nonNG3zOdVvdi==XD+zWv4USR5cS0g@mail.gmail.com>
 <C08A0C48-4D36-425D-92DB-6B64A73F9AB2@crissov.de>
 <000001d6fe46$24117ce0$6c3476a0$@ewellic.org>
Message-ID: <F200E6B8-8960-4592-A0C0-D9421A3D939B@bahnhof.se>


> 8 feb. 2021 kl. 19:13 skrev Doug Ewell via Unicode <unicode at unicode.org>:
> 
> Christoph P?per wrote:
> 
>> For all other (sub-)national flag emojis, Unicode does not suggest a
>> particular design, not raus its existence for the RGI label. Vendors
>> would be free to display the one of TW the same as CN, for instance,
>> but they decided to not show it at all in devices for the mainland
>> Chinese market.
> 
> I agree with Christoph here. Having two (or more) flag designs to choose from when rendering a flag image is a matter of glyph design, just like having single-story and double-story glyph variants of the letters 'a' and 'g?.

Hmm, while those examples have a high degree of ?free variation? between ?single-story? and ?double-story? (as well as other variations), the situation with flags is not quite the same.

While one in may ?freely? vary between ?flat? and ?faking wavy? designs for flag glyphs, and even distort some proportions (but not too much), many other differences are either wrong or time dependent.

And that is a major flaw in the denotation design for flags in Unicode. But sometimes nations change flags, sometimes a little bit, sometimes radically. And then ?we? are in trouble. I would say changing glyphs from the design used in one era to another used in another era (for the same, or nearly the same territory) would be the same as changing character identity for a ?normal? coded character, like changing the code for A to suddenly be displayed as a B.

Even though different flags may ?denote? the same territory over time, using the wrong one in a ?timed? document would be an error. In some cases one or the other design may even be offensive (which has obviously happened historically, and even now in some places). Of course one can use images for flags, assuming the document format allows for embedding images, rather than Unicode denotations for flags. That would solve such problems, but now we are discussing the Unicode way of denoting flags.

> And no vendor or font designer is ever required to include glyphs for every Unicode character or sequence, "recommended" or not.

Agree with that (in principle?).

/Kent K

> 
> --
> Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org
> 
> 
> 


From christoph.paeper at crissov.de  Tue Feb  9 02:18:14 2021
From: christoph.paeper at crissov.de (=?utf-8?Q?Christoph_P=C3=A4per?=)
Date: Tue, 9 Feb 2021 09:18:14 +0100
Subject: Stickers
In-Reply-To: <1fbb0ea4.e9.1778127a034.Webtop.229@btinternet.com>
References: <1fbb0ea4.e9.1778127a034.Webtop.229@btinternet.com>
Message-ID: <11BEE956-D529-45A4-8263-B202705577E8@crissov.de>

William_J_G Overington via Unicode <unicode at unicode.org>:
> 
> What exactly is a sticker please?

A sticker in this context is only used within instant messaging (IM) platforms and apps. It is an vector drawing (mostlySVG) or raster image (PNG or WebP or, rarely, JPEG) that is often part of a larger set of visually or thematically related graphics. Those might be, but usually are not, packaged inside a font format file. Each sticker exemplifies an emotion, reaction or concept, which may be associated with one or more Unicode emojis as a kind of tag or keyword to facilitate simple search or suggestions for substitution. 

Stickers are considered more personalized than standardized emoji, because, at least in principle, each user could design and share their own. Apple Memoji and similar solutions by other vendors can be considered personalized dynamic sticker (and avatar) generators, in this case  sharing graphic base models with the emoji font used by that vendor. 

Gifs are short animated image sequences or video clips without audio track that historically used Compuserve?s 8-bit graphics interchange file format (GIF89a), but nowadays APNG, WebM/MKV+VPx or MP4/H.26x. A gif is usually not part of a set, but it is often shared through public services. They are used to visualize emotions or to emphasize reactions and sometimes purely for decoration. 

Memes may be gifs, but are more often still images (mostly comics, photographs or captured video frames, often in JPEG format), frequently including textual overlays. They reference short-lived phenomena from popular culture to exemplify reactions. Based on a single consistent visual component, memes usually have countless variations, but otherwise are not part of a set. 

What makes all of these related to emojis is that they visually augment or even replace written performative acts in a primarily text-based communication medium, often in 1:1 or m:n scenarios like chat and social media and less often in 1:n prose. 


From richard.wordingham at ntlworld.com  Tue Feb 16 23:17:41 2021
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Wed, 17 Feb 2021 05:17:41 +0000
Subject: Zawgyi Tonemarks in Latin Script
Message-ID: <20210217051741.47de04dd@JRWUBU2>

I've been cleaning up some mojibake and I'm stumped on how to clean up
what are intended to be U+1037 MYANMAR SIGN DOT BELOW (the original
text hijacked U+1095 as best fitting U+1E45 LATIN SMALL LETTER N WITH
DOT ABOVE) and U+1038 MYANMAR SIGN VISARGA in Burmese text
transliterated to the Roman script. While U+0325 COMBINING RING BELOW
works for the first sign, what should I use for the second sign? I want
to preserve the Romanisation, not retransliterate.  I presume the
Zawgyi-encoded pseudo-Unicode worked fine in the original Zawgyi-attuned
rendering system. 

If I apply the Myanamar script signs to the Latin letters, the renderer
punishes me with dotted circles.

Richard.

From jameskass at code2001.com  Tue Feb 16 23:40:54 2021
From: jameskass at code2001.com (James Kass)
Date: Wed, 17 Feb 2021 05:40:54 +0000
Subject: Zawgyi Tonemarks in Latin Script
In-Reply-To: <20210217051741.47de04dd@JRWUBU2>
References: <20210217051741.47de04dd@JRWUBU2>
Message-ID: <c169ad77-9b5f-d48f-b02e-ea081e7e3133@code2001.com>


On 2021-02-17 5:17 AM, Richard Wordingham via Unicode wrote:
> If I apply the Myanamar script signs to the Latin letters, the renderer
> punishes me with dotted circles.
>
> Richard.
Unable to repro this here.? The string "k?" does not display with the 
dotted circle.? Tried this on Windows 7 with both BabelPad and 
LibreOffice.? (And now in the compose panel of Mozilla Thunderbird.)

Maybe file a bug with the renderer developer?

From abrahamgross at disroot.org  Wed Feb 17 09:44:08 2021
From: abrahamgross at disroot.org (abrahamgross at disroot.org)
Date: Wed, 17 Feb 2021 15:44:08 +0000 (UTC)
Subject: Zawgyi Tonemarks in Latin Script
In-Reply-To: <c169ad77-9b5f-d48f-b02e-ea081e7e3133@code2001.com>
References: <20210217051741.47de04dd@JRWUBU2>
 <c169ad77-9b5f-d48f-b02e-ea081e7e3133@code2001.com>
Message-ID: <341e16d6-f589-4226-9c9f-1f2e7cafcee4@disroot.org>

I see the? dotted circle circle on android


From markus.icu at gmail.com  Wed Feb 17 10:51:35 2021
From: markus.icu at gmail.com (Markus Scherer)
Date: Wed, 17 Feb 2021 08:51:35 -0800
Subject: Zawgyi Tonemarks in Latin Script
In-Reply-To: <c169ad77-9b5f-d48f-b02e-ea081e7e3133@code2001.com>
References: <20210217051741.47de04dd@JRWUBU2>
 <c169ad77-9b5f-d48f-b02e-ea081e7e3133@code2001.com>
Message-ID: <CAN49p6puOX8_Nq_F92=iC6kXOeoSrVdxtgU=h4oWx=ASRvneEg@mail.gmail.com>

On Tue, Feb 16, 2021 at 9:43 PM James Kass via Unicode <unicode at unicode.org>
wrote:

> Unable to repro this here.  The string "k?" does not display with the
> dotted circle.


Dotted circle on a Chromebook (displayed in Gmail).

markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20210217/de291f38/attachment.htm>

From asmusf at ix.netcom.com  Wed Feb 17 11:17:35 2021
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Wed, 17 Feb 2021 09:17:35 -0800
Subject: Zawgyi Tonemarks in Latin Script
In-Reply-To: <CAN49p6puOX8_Nq_F92=iC6kXOeoSrVdxtgU=h4oWx=ASRvneEg@mail.gmail.com>
References: <20210217051741.47de04dd@JRWUBU2>
 <c169ad77-9b5f-d48f-b02e-ea081e7e3133@code2001.com>
 <CAN49p6puOX8_Nq_F92=iC6kXOeoSrVdxtgU=h4oWx=ASRvneEg@mail.gmail.com>
Message-ID: <c55acc26-b589-aef8-73e8-4c012aafb586@ix.netcom.com>

An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20210217/17884260/attachment.htm>

From richard.wordingham at ntlworld.com  Wed Feb 17 11:43:35 2021
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Wed, 17 Feb 2021 17:43:35 +0000
Subject: Zawgyi Tonemarks in Latin Script
In-Reply-To: <c169ad77-9b5f-d48f-b02e-ea081e7e3133@code2001.com>
References: <20210217051741.47de04dd@JRWUBU2>
 <c169ad77-9b5f-d48f-b02e-ea081e7e3133@code2001.com>
Message-ID: <20210217174335.719eaadf@JRWUBU2>

On Wed, 17 Feb 2021 05:40:54 +0000
James Kass via Unicode <unicode at unicode.org> wrote:

> Unable to repro this here.? The string "k?" does not display with the 
> dotted circle.? Tried this on Windows 7 with both BabelPad and 
> LibreOffice.? (And now in the compose panel of Mozilla Thunderbird.)

That is curious.  Which font were you using?

In Word on Windows 10, using the font Myanmar text for the whole
string, in LibreOffice and Firefox on Ubuntu 16.04 (so at least one of
them falls back to HarfBuzz Version 1.2.7), and with the Padauk font
using HarfBuzz Version 2.7.2, I get a dotted circle even for an ASCII
letter plus U+1038 MYANMAR SIGN VISARGA.

Of course, there's no problem with HarfBuzz if one uses the Zawgyi-One
font, which is one of the few to support the sequence <U+1E45,
U+1038>.

> Maybe file a bug with the renderer developer?

They could argue that it's not the sort of sequence that they will
support.  (Am I right in thinking that a Unicode-compliant renderer may
deliberately misrender unsupported sequences?) Unfortunately, the
Unicode technical annexes support the principle of separating a base
character from its marks when the extended script property doesn't
support their combination. (I've already complained to Mark Davis about
this.)  After all, if you want a candrabindu on the Latin letter 'l',
or 'v', or 'y', you use U+0310 COMBINING CANDRABINDU.

Richard.


From jameskass at code2001.com  Wed Feb 17 12:01:22 2021
From: jameskass at code2001.com (James Kass)
Date: Wed, 17 Feb 2021 18:01:22 +0000
Subject: Zawgyi Tonemarks in Latin Script
In-Reply-To: <20210217174335.719eaadf@JRWUBU2>
References: <20210217051741.47de04dd@JRWUBU2>
 <c169ad77-9b5f-d48f-b02e-ea081e7e3133@code2001.com>
 <20210217174335.719eaadf@JRWUBU2>
Message-ID: <16222a2c-872e-a09a-49dc-0bca99fea32d@code2001.com>


On 2021-02-17 5:43 PM, Richard Wordingham via Unicode wrote:
> Which font were you using?

Tried both Code2000 and Myanmar1 [Myanmar1:Version 0.55 from Myanmar 
NLP].? Both fonts have dotted circle glyphs properly mapped.

One possible workaround for font developers who aren't especially fond 
of the dotted circles might be to map a zero-width no contour glyph to 
the DOTTED CIRLE character, although I haven't tried this.

From jukkakk at gmail.com  Wed Feb 17 13:48:38 2021
From: jukkakk at gmail.com (Jukka K. Korpela)
Date: Wed, 17 Feb 2021 21:48:38 +0200
Subject: Zawgyi Tonemarks in Latin Script
In-Reply-To: <c169ad77-9b5f-d48f-b02e-ea081e7e3133@code2001.com>
References: <20210217051741.47de04dd@JRWUBU2>
 <c169ad77-9b5f-d48f-b02e-ea081e7e3133@code2001.com>
Message-ID: <CAGHxYa72OXvU+L0H5gp1Kif4Xhfkz1mLPiASaVuRv98Wh5Z-zw@mail.gmail.com>

James Kass via Unicode (unicode at unicode.org) kirjoitti:

>
> Unable to repro this here.  The string "k?" does not display with the
> dotted circle.  Tried this on Windows 7 with both BabelPad and
> LibreOffice.  (And now in the compose panel of Mozilla Thunderbird.)


This seems to depend on the font. On Win 10. I get a rendering with a
dotted circle between the Latin letter and the mark when using the Myanmar
Text font, but without it when using the Code2000 font. This happens e.g.
in Word 365 and in BabelPad, and even in NotePad.

Perhaps more surprisingly, the Google font Padauk, when tested via
https://fonts.google.com/specimen/Padauk?subset=myanmar&preview.text=k%E1%80%B8&preview.text_type=custom
(tested on Chrome) shows the dotted circle when using regular (weight 400)
font but not when using bold (700) font.

I don?t quite understand the original problem. If you Romanize text, why
would you use marks of the original script? I think Romanization schemes
typically map marks to some combining marks commonly used for Latin letters
or some punctuation or special characters.

Jukka

>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20210217/d08cd050/attachment.htm>

From richard.wordingham at ntlworld.com  Wed Feb 17 13:52:41 2021
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Wed, 17 Feb 2021 19:52:41 +0000
Subject: Zawgyi Tonemarks in Latin Script
In-Reply-To: <16222a2c-872e-a09a-49dc-0bca99fea32d@code2001.com>
References: <20210217051741.47de04dd@JRWUBU2>
 <c169ad77-9b5f-d48f-b02e-ea081e7e3133@code2001.com>
 <20210217174335.719eaadf@JRWUBU2>
 <16222a2c-872e-a09a-49dc-0bca99fea32d@code2001.com>
Message-ID: <20210217195241.14941fa7@JRWUBU2>

On Wed, 17 Feb 2021 18:01:22 +0000
James Kass via Unicode <unicode at unicode.org> wrote:

> On 2021-02-17 5:43 PM, Richard Wordingham via Unicode wrote:
> > Which font were you using?  
> 
> Tried both Code2000 and Myanmar1 [Myanmar1:Version 0.55 from Myanmar 
> NLP].? Both fonts have dotted circle glyphs properly mapped.

I can confirm that behaviour for HarfBuzz Version 2.7.4.  However, I
used VersionS 1.15 and 1.171 of Code2000, which have the invalid script
tag "myan" instead of "mymr" or "mym2", and for which Indic
rearrangement and subscript consonant formation do not occur either.

Using Myanmar1 Version 0.55 with HarfBuzz Version 2.7.4 achieves
rearrangement and subscript formation, and does not not insert the
dotted circle even for defective sequences starting with a non-spacing
mark. It has glyph substitution lookups for the script "mymr" only (not
even for the default script).

The Padauk font I used has lookups for both "mymr" and "mym2".  That
might be significant; the former would have been ignored in favour of
the latter.

Richard.


From vinodh.vinodh at gmail.com  Wed Feb 17 15:12:37 2021
From: vinodh.vinodh at gmail.com (Vinodh Rajan)
Date: Wed, 17 Feb 2021 22:12:37 +0100
Subject: Zawgyi Tonemarks in Latin Script
In-Reply-To: <20210217174335.719eaadf@JRWUBU2>
References: <20210217051741.47de04dd@JRWUBU2>
 <c169ad77-9b5f-d48f-b02e-ea081e7e3133@code2001.com>
 <20210217174335.719eaadf@JRWUBU2>
Message-ID: <CANwwB-KkvWRZZ9tThA_qsaOYhoxb3gbDtJSEdch0mDUBPC8T_A@mail.gmail.com>

>
>  If you Romanize text, why would you use marks of the original script? I
> think Romanization schemes typically map marks to some combining
> marks commonly used for Latin letters or some punctuation or special
> characters.
>

 I was composing a document literally yesterday, which required me to do
this.

[image: image.png]
I had to choose a font that has does not contain the dotted circle to
circumvent the rendering engines.

Thai has three viramas (sort of) and it makes sense to use the original
marks in the romanization to retain the differentiation.  I can of course
invent three new diacritic marks that work with Latin letters. But it is a
one-off thing, It doesn't make sense to include a note explaining my ad-hoc
conventions just for that one word. It's just too laborious.

Vinodh

On Wed, Feb 17, 2021 at 6:45 PM Richard Wordingham via Unicode <
unicode at unicode.org> wrote:

> On Wed, 17 Feb 2021 05:40:54 +0000
> James Kass via Unicode <unicode at unicode.org> wrote:
>
> > Unable to repro this here.  The string "k?" does not display with the
> > dotted circle.  Tried this on Windows 7 with both BabelPad and
> > LibreOffice.  (And now in the compose panel of Mozilla Thunderbird.)
>
> That is curious.  Which font were you using?
>
> In Word on Windows 10, using the font Myanmar text for the whole
> string, in LibreOffice and Firefox on Ubuntu 16.04 (so at least one of
> them falls back to HarfBuzz Version 1.2.7), and with the Padauk font
> using HarfBuzz Version 2.7.2, I get a dotted circle even for an ASCII
> letter plus U+1038 MYANMAR SIGN VISARGA.
>
> Of course, there's no problem with HarfBuzz if one uses the Zawgyi-One
> font, which is one of the few to support the sequence <U+1E45,
> U+1038>.
>
> > Maybe file a bug with the renderer developer?
>
> They could argue that it's not the sort of sequence that they will
> support.  (Am I right in thinking that a Unicode-compliant renderer may
> deliberately misrender unsupported sequences?) Unfortunately, the
> Unicode technical annexes support the principle of separating a base
> character from its marks when the extended script property doesn't
> support their combination. (I've already complained to Mark Davis about
> this.)  After all, if you want a candrabindu on the Latin letter 'l',
> or 'v', or 'y', you use U+0310 COMBINING CANDRABINDU.
>
> Richard.
>
>

-- 
http://www.virtualvinodh.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20210217/a3811644/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 103982 bytes
Desc: not available
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20210217/a3811644/attachment-0001.png>

From richard.wordingham at ntlworld.com  Wed Feb 17 16:31:01 2021
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Wed, 17 Feb 2021 22:31:01 +0000
Subject: Zawgyi Tonemarks in Latin Script
In-Reply-To: <CAGHxYa72OXvU+L0H5gp1Kif4Xhfkz1mLPiASaVuRv98Wh5Z-zw@mail.gmail.com>
References: <20210217051741.47de04dd@JRWUBU2>
 <c169ad77-9b5f-d48f-b02e-ea081e7e3133@code2001.com>
 <CAGHxYa72OXvU+L0H5gp1Kif4Xhfkz1mLPiASaVuRv98Wh5Z-zw@mail.gmail.com>
Message-ID: <20210217223101.4069ad6b@JRWUBU2>

On Wed, 17 Feb 2021 21:48:38 +0200
"Jukka K. Korpela via Unicode" <unicode at unicode.org> wrote:

> I don?t quite understand the original problem. If you Romanize text,
> why would you use marks of the original script? I think Romanization
> schemes typically map marks to some combining marks commonly used for
> Latin letters or some punctuation or special characters.

It tends to happen when there isn't an obvious transliteration, or the
scheme just doesn't match.  For example, it is not uncommon to find
Sanskrit in the Roman script using danda and double danda as
punctuation.  The consonant nasalisation mark, candrabindu, has been
borrowed for writing Sanskrit in the Roman script, which is why we have
U+0310 COMBINING CANDRABINDU.  I have seen this 'Latin' candrabindu in
print outside Sanskrit text books, I think in the journal 'Word'.

I couldn't find many examples on-line, but one can be found in the Pali
Text Society 2019 publication "The Catalogue of Manuscript in the U Pho
Thi Library, Thaton, Myanmar" (ISBN-13 9780 86013 081 9) - an
extract is accessible at
<https://www.cari.ne.jp/MyanmarPJ/1109G/UPT%20Catalogue%20Oct%2028Web.pdf>
The quotation I was cleaning up is a quotation of one of the authors of
that catalogue. There is some vacillation between using the Burmese
marks and a full stop and colon, but the Roman punctuation marks are
avoided when they might be misinterpreted as punctuation.

Richard.


From jameskass at code2001.com  Wed Feb 17 19:49:59 2021
From: jameskass at code2001.com (James Kass)
Date: Thu, 18 Feb 2021 01:49:59 +0000
Subject: Zawgyi Tonemarks in Latin Script
In-Reply-To: <20210217195241.14941fa7@JRWUBU2>
References: <20210217051741.47de04dd@JRWUBU2>
 <c169ad77-9b5f-d48f-b02e-ea081e7e3133@code2001.com>
 <20210217174335.719eaadf@JRWUBU2>
 <16222a2c-872e-a09a-49dc-0bca99fea32d@code2001.com>
 <20210217195241.14941fa7@JRWUBU2>
Message-ID: <18c23f55-ca28-c3c1-2809-520faa604861@code2001.com>


On 2021-02-17 7:52 PM, Richard Wordingham via Unicode wrote:
> I can confirm that behaviour for HarfBuzz Version 2.7.4.  However, I
> used VersionS 1.15 and 1.171 of Code2000, which have the invalid script
> tag "myan" instead of "mymr" or "mym2", and for which Indic
> rearrangement and subscript consonant formation do not occur either.

I used Code2000 Version 1.172, but it also has the older script tag 
"myan".? Font Validator shows the "myan" tag as valid, but thanks to 
your pointer I checked the OpenType specs and will be changing the tag 
to "mymr" for the next release.


From Andrew.Glass at microsoft.com  Wed Feb 17 21:12:02 2021
From: Andrew.Glass at microsoft.com (Andrew Glass)
Date: Thu, 18 Feb 2021 03:12:02 +0000
Subject: [EXTERNAL] Re: Zawgyi Tonemarks in Latin Script
In-Reply-To: <18c23f55-ca28-c3c1-2809-520faa604861@code2001.com>
References: <20210217051741.47de04dd@JRWUBU2>
 <c169ad77-9b5f-d48f-b02e-ea081e7e3133@code2001.com>
 <20210217174335.719eaadf@JRWUBU2>
 <16222a2c-872e-a09a-49dc-0bca99fea32d@code2001.com>
 <20210217195241.14941fa7@JRWUBU2>
 <18c23f55-ca28-c3c1-2809-520faa604861@code2001.com>
Message-ID: <BYAPR00MB088678A318534BF9102FDE268E859@BYAPR00MB0886.namprd00.prod.outlook.com>

Hi James,

If you want the glyph rearrangement to occur, please use the mym2 tag. The mymr tag is a legacy tag for pre-shaping Myanmar Unicode fonts such as Myanmar 3 which do their own reordering.

Cheers,

Andrew

-----Original Message-----
From: Unicode <unicode-bounces at unicode.org> On Behalf Of James Kass via Unicode
Sent: 17 February 2021 17:50
To: unicode at unicode.org
Subject: [EXTERNAL] Re: Zawgyi Tonemarks in Latin Script


On 2021-02-17 7:52 PM, Richard Wordingham via Unicode wrote:
> I can confirm that behaviour for HarfBuzz Version 2.7.4.  However, I 
> used VersionS 1.15 and 1.171 of Code2000, which have the invalid 
> script tag "myan" instead of "mymr" or "mym2", and for which Indic 
> rearrangement and subscript consonant formation do not occur either.

I used Code2000 Version 1.172, but it also has the older script tag "myan".? Font Validator shows the "myan" tag as valid, but thanks to your pointer I checked the OpenType specs and will be changing the tag to "mymr" for the next release.


From richard.wordingham at ntlworld.com  Thu Feb 18 03:04:48 2021
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Thu, 18 Feb 2021 09:04:48 +0000
Subject: Subscript Manual WA (was: Zawgyi Tonemarks in Latin Script)
In-Reply-To: <BYAPR00MB088678A318534BF9102FDE268E859@BYAPR00MB0886.namprd00.prod.outlook.com>
References: <20210217051741.47de04dd@JRWUBU2>
 <c169ad77-9b5f-d48f-b02e-ea081e7e3133@code2001.com>
 <20210217174335.719eaadf@JRWUBU2>
 <16222a2c-872e-a09a-49dc-0bca99fea32d@code2001.com>
 <20210217195241.14941fa7@JRWUBU2>
 <18c23f55-ca28-c3c1-2809-520faa604861@code2001.com>
 <BYAPR00MB088678A318534BF9102FDE268E859@BYAPR00MB0886.namprd00.prod.outlook.com>
Message-ID: <20210218090448.59bc2324@JRWUBU2>

On Thu, 18 Feb 2021 03:12:02 +0000
Andrew Glass via Unicode <unicode at unicode.org> wrote:

> If you want the glyph rearrangement to occur, please use the mym2
> tag. The mymr tag is a legacy tag for pre-shaping Myanmar Unicode
> fonts such as Myanmar 3 which do their own reordering.

Would mymr be the suitable tag for fonts supporting legacy languages
such as Sanskrit?  Syllable-initial subscript WA seems not to be
supported by the modern system; one has to approximate it by
U+103D MYANMAR CONSONANT SIGN MEDIAL WA.

Richard.

From Andrew.Glass at microsoft.com  Thu Feb 18 13:36:37 2021
From: Andrew.Glass at microsoft.com (Andrew Glass)
Date: Thu, 18 Feb 2021 19:36:37 +0000
Subject: [EXTERNAL] Subscript Manual WA (was: Zawgyi Tonemarks in Latin
 Script)
In-Reply-To: <20210218090448.59bc2324@JRWUBU2>
References: <20210217051741.47de04dd@JRWUBU2>
 <c169ad77-9b5f-d48f-b02e-ea081e7e3133@code2001.com>
 <20210217174335.719eaadf@JRWUBU2>
 <16222a2c-872e-a09a-49dc-0bca99fea32d@code2001.com>
 <20210217195241.14941fa7@JRWUBU2>
 <18c23f55-ca28-c3c1-2809-520faa604861@code2001.com>
 <BYAPR00MB088678A318534BF9102FDE268E859@BYAPR00MB0886.namprd00.prod.outlook.com>
 <20210218090448.59bc2324@JRWUBU2>
Message-ID: <MN2PR00MB0893E0CA50D56DEA495E93DE8E859@MN2PR00MB0893.namprd00.prod.outlook.com>

Great question Richard, can you provide some examples? Do we have an agreed encoding mechanism for this?
In principle, I would not recommend using mymr because the reordering requirements for Myanmar are complex and it would be inefficient for a font to try and handle them. That said, all kinds of things are possible with OpenType, so it may be a practical workaround in the short term. However, if we can understand the requirement, updating the Myanmar cluster validation and reordering logic to support would be the preferred option here - if it isn't already possible.

Cheers,

Andrew

-----Original Message-----
From: Unicode <unicode-bounces at unicode.org> On Behalf Of Richard Wordingham via Unicode
Sent: 18 February 2021 01:05
To: unicode at unicode.org
Subject: [EXTERNAL] Subscript Manual WA (was: Zawgyi Tonemarks in Latin Script)

On Thu, 18 Feb 2021 03:12:02 +0000
Andrew Glass via Unicode <unicode at unicode.org> wrote:

> If you want the glyph rearrangement to occur, please use the mym2 tag. 
> The mymr tag is a legacy tag for pre-shaping Myanmar Unicode fonts 
> such as Myanmar 3 which do their own reordering.

Would mymr be the suitable tag for fonts supporting legacy languages such as Sanskrit?  Syllable-initial subscript WA seems not to be supported by the modern system; one has to approximate it by
U+103D MYANMAR CONSONANT SIGN MEDIAL WA.

Richard.


From richard.wordingham at ntlworld.com  Thu Feb 18 18:16:20 2021
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Fri, 19 Feb 2021 00:16:20 +0000
Subject: [EXTERNAL] Subscript Manual WA (was: Zawgyi Tonemarks in Latin
 Script)
In-Reply-To: <MN2PR00MB0893E0CA50D56DEA495E93DE8E859@MN2PR00MB0893.namprd00.prod.outlook.com>
References: <20210217051741.47de04dd@JRWUBU2>
 <c169ad77-9b5f-d48f-b02e-ea081e7e3133@code2001.com>
 <20210217174335.719eaadf@JRWUBU2>
 <16222a2c-872e-a09a-49dc-0bca99fea32d@code2001.com>
 <20210217195241.14941fa7@JRWUBU2>
 <18c23f55-ca28-c3c1-2809-520faa604861@code2001.com>
 <BYAPR00MB088678A318534BF9102FDE268E859@BYAPR00MB0886.namprd00.prod.outlook.com>
 <20210218090448.59bc2324@JRWUBU2>
 <MN2PR00MB0893E0CA50D56DEA495E93DE8E859@MN2PR00MB0893.namprd00.prod.outlook.com>
Message-ID: <20210219001620.7f0dd007@JRWUBU2>

On Thu, 18 Feb 2021 19:36:37 +0000
Andrew Glass via Unicode <unicode at unicode.org> wrote:

The lack isn't where I thought it was - it turns out that the shaper
specification already supports the non-medial subscript WA!  I tweaked
the OpenType lookup in Padauk to to generate the ?lyph for <U+1039,
U+101D> to check where the problem lay, but didn't realise that the
HarfBuzz test program hb-view would by default use the Graphite
shaping!  When I selected the OpenType renderin?, I got the correct
rendering from the tweaked font.

The problem is that *fonts* seem not to be including the subscript
WA, because it isn't required for *Modern Burmese*.  It so happens
that the major fonts' rendering of MEDIAL WA is suitable for
<VIRAMA, WA> - the pain of overlapping glyph ranges!
 
> Great question Richard, can you provide some examples? Do we have an
> agreed encoding mechanism for this?

I'll give a detailed answer, though the renderers already have the
solution.

The need for a distinction was put forward by Michael Everson at al. in
at least the following:

L2/06-029
L2/06-077 p2 (a.k.a. WG2 N3043)
L2/06-213


L2/06-077 p3 states, "Note that kwa with MEDIAL WA may take a teardrop
or triangular WA shape, which is never the case with true subjoined WA
(which is rare, though it occurs in Sanskrit)."

Martin Hosken put forward other arguments, but I'm not sure that they
were found convincing.

As to examples, just look at the absolutives in
https://www.alamy.com/stock-photo-burmese-writing-pali-canon-buddhist-canon-tripitaka-library-of-stone-21244784.html .
I'd been goin? to say look for -itva? for both Pali and Sanskrit, but
this forms seems commoner in word lists than actual text. 

TUS 13.0 Section 16.3 p647 says, "In Pali and Sanskrit texts written in
the Myanmar script, as well as in older orthographies of Burmese, the
consonants ya, ra, wa, and ha are sometimes rendered in subjoined form.
In those cases, U+1039 ? myanmar sign virama and the regular form of
the consonant are used."

Thus, examples abound, and the encoding is defined.  The codechart
currently shows a teardrop shape for U+103D MYANMAR CONSONANT SIGN
MEDIAL WA - that would not be suitable for <VIRANA, WA>.

Richard.


From markus.icu at gmail.com  Thu Feb 18 19:22:25 2021
From: markus.icu at gmail.com (Markus Scherer)
Date: Thu, 18 Feb 2021 17:22:25 -0800
Subject: new ISO 15924 script codes 2021q1
Message-ID: <CAN49p6o4_g2xWK9XpVzcSOAhVoiVRE5nr20ZSiW-sQqChHOUaQ@mail.gmail.com>

Dear Unicoders,

FYI  There are seven new script codes registered last month and this month:
https://www.unicode.org/iso15924/codechanges.html

CodeN?English NameNom fran?aisAlias
<http://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt>Age
DateKey
Tnsa 275 Tangsa tangsa 2021-02-17 Add
Vith 228 Vithkuqi vithkuqi 2021-02-17 Add
Ougr 143 Old Uyghur ancien ou?gour 2021-01-25 Add
Pcun 015 Proto-Cuneiform proto-cun?iforme 2021-01-25 Add
Pelm 016 Proto-Elamite proto-?lamite 2021-01-25 Add
Psin 103 Proto-Sinaitic proto-sina?tique 2021-01-25 Add
Ranj 303 Ranjana ranjana 2021-01-25 Add

Best regards,
markus
ISO 15924 script code registrar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20210218/b291a4f0/attachment.htm>

From wjgo_10009 at btinternet.com  Fri Feb 19 03:39:46 2021
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Fri, 19 Feb 2021 09:39:46 +0000 (GMT)
Subject: ISO 15924 codes for unwritten documents and for inherited script
 (from Re: new ISO 15924 script codes 2021q1)
In-Reply-To: <CAN49p6o4_g2xWK9XpVzcSOAhVoiVRE5nr20ZSiW-sQqChHOUaQ@mail.gmail.com>
References: <CAN49p6o4_g2xWK9XpVzcSOAhVoiVRE5nr20ZSiW-sQqChHOUaQ@mail.gmail.com>
Message-ID: <7acc5570.c4.177b9a8d845.Webtop.73@btinternet.com>


Hi

Thank you for posting.

I looked through the linked list.

Could you say where, how and why the following codes would be used 
please?

>> Zxxx 997 Code for unwritten documents codet pour les documents non 
>> ?crites

>> Zinh 994 Code for inherited script codet pour ?criture h?rit?e

Best regards,

William Overington

Friday 19 February 2021


------ Original Message ------
From: "Markus Scherer via Unicode" <unicode at unicode.org>
To: "Unicode Mailing List" <unicode at unicode.org>
Sent: Friday, 2021 Feb 19 At 01:22
Subject: new ISO 15924 script codes 2021q1

Dear Unicoders,

FYI  There are seven new script codes registered last month and this 
month:
https://www.unicode.org/iso15924/codechanges.html 
<https://www.unicode.org/iso15924/codechanges.html>


CodeN?English NameNom fran?aisAlias 
<http://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt> 
AgeDateKeyTnsa275Tangsatangsa2021-02-17AddVith228Vithkuqivithkuqi2021-02-17AddOugr143Old 
Uyghurancien 
ou?gour2021-01-25AddPcun015Proto-Cuneiformproto-cun?iforme2021-01-25AddPelm016Proto-Elamiteproto-?lamite2021-01-25AddPsin103Proto-Sinaiticproto-sina?tique2021-01-25AddRanj303Ranjanaranjana2021-01-25Add


Best regards,
markus
ISO 15924 script code registrar

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20210219/18b2334d/attachment.htm>

From asmusf at ix.netcom.com  Fri Feb 19 10:32:23 2021
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Fri, 19 Feb 2021 08:32:23 -0800
Subject: ISO 15924 codes for unwritten documents and for inherited script
 (from Re: new ISO 15924 script codes 2021q1)
In-Reply-To: <7acc5570.c4.177b9a8d845.Webtop.73@btinternet.com>
References: <CAN49p6o4_g2xWK9XpVzcSOAhVoiVRE5nr20ZSiW-sQqChHOUaQ@mail.gmail.com>
 <7acc5570.c4.177b9a8d845.Webtop.73@btinternet.com>
Message-ID: <866cf06c-af9f-68b0-c0af-ef2fda875b9b@ix.netcom.com>

An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20210219/4607ff27/attachment.htm>

From Andrew.Glass at microsoft.com  Fri Feb 19 16:49:12 2021
From: Andrew.Glass at microsoft.com (Andrew Glass)
Date: Fri, 19 Feb 2021 22:49:12 +0000
Subject: [EXTERNAL] Subscript Manual WA (was: Zawgyi Tonemarks in Latin
 Script)
In-Reply-To: <20210219001620.7f0dd007@JRWUBU2>
References: <20210217051741.47de04dd@JRWUBU2>
 <c169ad77-9b5f-d48f-b02e-ea081e7e3133@code2001.com>
 <20210217174335.719eaadf@JRWUBU2>
 <16222a2c-872e-a09a-49dc-0bca99fea32d@code2001.com>
 <20210217195241.14941fa7@JRWUBU2>
 <18c23f55-ca28-c3c1-2809-520faa604861@code2001.com>
 <BYAPR00MB088678A318534BF9102FDE268E859@BYAPR00MB0886.namprd00.prod.outlook.com>
 <20210218090448.59bc2324@JRWUBU2>
 <MN2PR00MB0893E0CA50D56DEA495E93DE8E859@MN2PR00MB0893.namprd00.prod.outlook.com>
 <20210219001620.7f0dd007@JRWUBU2>
Message-ID: <DM6PR00MB0891856E8DD280E811D3BE118E849@DM6PR00MB0891.namprd00.prod.outlook.com>

Thank you for the nice examples, Richard.
Indeed this is up to fonts to enable. Fonts could add a locl feature for Sanskrit to enable this example. That would depend on software to pass in the OT language tag appropriately. Or, fonts, could simply optimize for Sanskrit by default.

Cheers,

Andrew

-----Original Message-----
From: Unicode <unicode-bounces at unicode.org> On Behalf Of Richard Wordingham via Unicode
Sent: 18 February 2021 16:16
To: unicode at unicode.org
Subject: Re: [EXTERNAL] Subscript Manual WA (was: Zawgyi Tonemarks in Latin Script)

On Thu, 18 Feb 2021 19:36:37 +0000
Andrew Glass via Unicode <unicode at unicode.org> wrote:

The lack isn't where I thought it was - it turns out that the shaper specification already supports the non-medial subscript WA!  I tweaked the OpenType lookup in Padauk to to generate the ?lyph for <U+1039,
U+101D> to check where the problem lay, but didn't realise that the
HarfBuzz test program hb-view would by default use the Graphite shaping!  When I selected the OpenType renderin?, I got the correct rendering from the tweaked font.

The problem is that *fonts* seem not to be including the subscript WA, because it isn't required for *Modern Burmese*.  It so happens that the major fonts' rendering of MEDIAL WA is suitable for <VIRAMA, WA> - the pain of overlapping glyph ranges!
 
> Great question Richard, can you provide some examples? Do we have an 
> agreed encoding mechanism for this?

I'll give a detailed answer, though the renderers already have the solution.

The need for a distinction was put forward by Michael Everson at al. in at least the following:

L2/06-029
L2/06-077 p2 (a.k.a. WG2 N3043)
L2/06-213


L2/06-077 p3 states, "Note that kwa with MEDIAL WA may take a teardrop or triangular WA shape, which is never the case with true subjoined WA (which is rare, though it occurs in Sanskrit)."

Martin Hosken put forward other arguments, but I'm not sure that they were found convincing.

As to examples, just look at the absolutives in
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.alamy.com%2Fstock-photo-burmese-writing-pali-canon-buddhist-canon-tripitaka-library-of-stone-21244784.html&amp;data=04%7C01%7CAndrew.Glass%40microsoft.com%7Ccf0ec3894e04469b75a908d8d46ccf3e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637492911849319036%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=wChZZLe8tuIEjeWOibBFS2K8hAMrC5B1ZH1r2PAJcAU%3D&amp;reserved=0 .
I'd been goin? to say look for -itva? for both Pali and Sanskrit, but this forms seems commoner in word lists than actual text. 

TUS 13.0 Section 16.3 p647 says, "In Pali and Sanskrit texts written in the Myanmar script, as well as in older orthographies of Burmese, the consonants ya, ra, wa, and ha are sometimes rendered in subjoined form.
In those cases, U+1039 ? myanmar sign virama and the regular form of the consonant are used."

Thus, examples abound, and the encoding is defined.  The codechart currently shows a teardrop shape for U+103D MYANMAR CONSONANT SIGN MEDIAL WA - that would not be suitable for <VIRANA, WA>.

Richard.


From richard.wordingham at ntlworld.com  Sat Feb 20 05:13:20 2021
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Sat, 20 Feb 2021 11:13:20 +0000
Subject: Subscript Manual WA
In-Reply-To: <DM6PR00MB0891856E8DD280E811D3BE118E849@DM6PR00MB0891.namprd00.prod.outlook.com>
References: <20210217051741.47de04dd@JRWUBU2>
 <c169ad77-9b5f-d48f-b02e-ea081e7e3133@code2001.com>
 <20210217174335.719eaadf@JRWUBU2>
 <16222a2c-872e-a09a-49dc-0bca99fea32d@code2001.com>
 <20210217195241.14941fa7@JRWUBU2>
 <18c23f55-ca28-c3c1-2809-520faa604861@code2001.com>
 <BYAPR00MB088678A318534BF9102FDE268E859@BYAPR00MB0886.namprd00.prod.outlook.com>
 <20210218090448.59bc2324@JRWUBU2>
 <MN2PR00MB0893E0CA50D56DEA495E93DE8E859@MN2PR00MB0893.namprd00.prod.outlook.com>
 <20210219001620.7f0dd007@JRWUBU2>
 <DM6PR00MB0891856E8DD280E811D3BE118E849@DM6PR00MB0891.namprd00.prod.outlook.com>
Message-ID: <20210220111320.4fa1487b@JRWUBU2>

On Fri, 19 Feb 2021 22:49:12 +0000
Andrew Glass via Unicode <unicode at unicode.org> wrote:

> Thank you for the nice examples, Richard.
> Indeed this is up to fonts to enable. Fonts could add a locl feature
> for Sanskrit to enable this example. That would depend on software to
> pass in the OT language tag appropriately. Or, fonts, could simply
> optimize for Sanskrit by default.

The winning argument for this script has been that fonts should
be able to produce something that is not outrageously wrong even in the
absence of language information.  So, the logic should rather be to
disable the character <VIRAMA, WA> only for languages that don't have
it. If I've interpreted the reports correctly, it may turn up in Old
Burmese, but won't turn up in Modern Burmese.

So the logic would be to disable the subscripting of WA if the text
were tagged as being in Modern Burmese, as opposed to Old Burmese, Pali
(TBC) or Sanskrit.  I don't know how Pali in the Shan variant of the
Myanmar script should currently map to OpenType language tags - there
might not even be a BCP 47 tag for it.  I think there are similar
questions for other other local variants of Pali in the script.

There is research to be done into the spelling of the Pali and Sanskrit
clusters with WA as a second element.  I would not be surprised to find
different spellings word/phrase initially and finally.  (I've only seen
Pali 'kv' as the result of sandhi, e.g. of 'ko attho' to 'kvattho'.  It
still needs its own artwork for Pali in the Sinhala script!)

Richard.

From jameskass at code2001.com  Sat Feb 27 00:11:43 2021
From: jameskass at code2001.com (James Kass)
Date: Sat, 27 Feb 2021 06:11:43 +0000
Subject: Unicode 14.0 Alpha Review
In-Reply-To: <mailman.621.1614360400.1709.announcements@unicode.org>
References: <mailman.621.1614360400.1709.announcements@unicode.org>
Message-ID: <79bb4f68-c841-5cd1-129b-0d2a2489d581@code2001.com>


https://www.unicode.org/charts/PDF/Unicode-14.0/U140-2A700.pdf

Is the Unicode 14.0 provisional CJK character slated for U+2B736
a duplicate of existing character U+3B3F ? ?

Note that Chinese radical # 130 (?) often takes the shape of Chinese 
radical # 74 (?).