From kenwhistler at sonic.net Mon Feb 1 14:54:09 2021 From: kenwhistler at sonic.net (Ken Whistler) Date: Mon, 1 Feb 2021 12:54:09 -0800 Subject: =?UTF-8?Q?Re=3a_Origins_of_=e2=8c=9a_U+231A_WATCH_and_=e2=8c=9b_U+2?= =?UTF-8?Q?31B_HOURGLASS?= In-Reply-To: References: Message-ID: <484ebd87-46aa-0056-87df-e37420a1774a@sonic.net> Marcel, Well, having dusted off the archives, I now have the definitive answer as to the origin story for these two in Unicode 1.0. In Document UTC/1991-016, dating from January, 1991, there were 19 distinct requests for additions of characters, submitted by Layne Cannon on behalf of WordPerfect Corporation, for consideration at UTC #44, February 1, 1991. Number 13 of those requests included a request to encode "Clock" at U+2677 and "Hourglass" at U+2678. Those two characters are what ended up encoded as U+231A WATCH and U+231B HOURGLASS. The justification for them was that they are "Part of the WordPerfect iconic character set". And indeed, they can be found in the attached listing of the "WP Symbol Set 5" at 5,31 and 5,32. They are intermixed there with other characters that ended up in the Zapf dingbats set in Unicode. BTW, that same request from WordPerfect was also the origin of U+2319 TURNED NOT SIGN, which was submitted in the same request as "Inverted begining of line" [sic]. --Ken On 12/30/2020 6:45 PM, M. Pauluk via Unicode wrote: > Thanks Ken! I had already checked XCCS and IBM code pages too, ? > U+231A WATCH and ? U+231B HOURGLASS really couldn't have originated > there. From peroyomaslists at gmail.com Sat Feb 6 16:32:29 2021 From: peroyomaslists at gmail.com (=?UTF-8?Q?Andr=C3=A9s_Sanhueza?=) Date: Sat, 6 Feb 2021 19:32:29 -0300 Subject: =?UTF-8?Q?Best_character_to_use_for_the_=C2=ABbolaspa=C2=BB_sign_in_Sp?= =?UTF-8?Q?anish?= Message-ID: The RAE (The Royal Spanish Academy of the Language, an entity from Spain that tries to regulate the correct use of the Spanish language) uses a punctuation sign named ?bolaspa? (an X inside a circle, like ?) to precede examples of an incorrect expression in the language, something like, for example: Some people don't like to use double negatives, so saying something like > ??we don't need no education? is seen as wrong. Using either ?we don't need > education? or ?we need no education? is better. The RAE also uses an asterisk (*) for the same on occasion. Unlike the asterisk, the bolaspa is not really regular Spanish and I only remember having seen it on dissemination texts about the language that the RAE makes or similar. There are various similar characters on unicode and at least one of them are intended for mathematical usage, and while I know that *most* of the characters are encoded in relation for the shape itself rather than the meaning or glyph, I still don't know if ANY similar looking character will accomplish that function instinctively, so ask which do you think is the best character to use as a punctuation sign in text. -------------- next part -------------- An HTML attachment was scrubbed... URL: From copypaste at kittens.ph Sat Feb 6 16:45:21 2021 From: copypaste at kittens.ph (Fredrick Brennan) Date: Sat, 06 Feb 2021 17:45:21 -0500 Subject: Best character to use for the =?UTF-8?B?wqtib2xhc3Bhwrs=?= sign in Spanish In-Reply-To: References: Message-ID: <1846209.PLLzzlfL6S@laptop> Mr. Sanhuza: Please upload an image of the /bolaspa/ in use on a printed page. I can't really try to answer your question without one having never seen one before.? Best, Fred Brennan On Saturday, February 6, 2021 5:32:29 PM EST Andr?s Sanhueza via Unicode wrote: > The RAE (The Royal Spanish Academy of the Language, an entity from Spain > that tries to regulate the correct use of the Spanish language) uses a > punctuation sign named ?bolaspa? (an X inside a circle, like ?) to precede > examples of an incorrect expression in the language, something like, for > example: > > Some people don't like to use double negatives, so saying something like > > > ??we don't need no education? is seen as wrong. Using either ?we don't > > need > > education? or ?we need no education? is better. > > The RAE also uses an asterisk (*) for the same on occasion. Unlike the > asterisk, the bolaspa is not really regular Spanish and I only remember > having seen it on dissemination texts about the language that the RAE makes > or similar. There are various similar characters on unicode and at least > one of them are intended for mathematical usage, and while I know that > *most* of the characters are encoded in relation for the shape itself > rather than the meaning or glyph, I still don't know if ANY similar looking > character will accomplish that function instinctively, so ask which do you > think is the best character to use as a punctuation sign in text. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jameskass at code2001.com Sat Feb 6 16:50:51 2021 From: jameskass at code2001.com (James Kass) Date: Sat, 6 Feb 2021 22:50:51 +0000 Subject: =?UTF-8?Q?Re=3a_Best_character_to_use_for_the_=c2=abbolaspa=c2=bb_s?= =?UTF-8?Q?ign_in_Spanish?= In-Reply-To: <1846209.PLLzzlfL6S@laptop> References: <1846209.PLLzzlfL6S@laptop> Message-ID: <03bd6811-4d4d-8fb9-53be-0dd50065dec3@code2001.com> On 2021-02-06 10:45 PM, Fredrick Brennan via Unicode wrote: > Mr. Sanhuza: > > Please upload an image of the /bolaspa/ in use on a printed page. I can't really try > to answer your question without one having never seen one before.? > > Best, > Fred Brennan > Here's a couple of web page links: https://spanish.stackexchange.com/questions/32080/tiene-nombre-el-signo-de-la-cruz-en-un-c%c3%adrculo-que-veo-a-veces-en-el-dpd/32081#32081 https://dle.rae.es/bolaspa Both show the symbol as a superscript, above the baseline.? The first page cites an article which refers to the bolaspa as a symbol of "medieval torture". From duerst at it.aoyama.ac.jp Sat Feb 6 17:57:39 2021 From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J=2e_D=c3=bcrst?=) Date: Sun, 7 Feb 2021 08:57:39 +0900 Subject: =?UTF-8?Q?Re=3a_Best_character_to_use_for_the_=c2=abbolaspa=c2=bb_s?= =?UTF-8?Q?ign_in_Spanish?= In-Reply-To: <03bd6811-4d4d-8fb9-53be-0dd50065dec3@code2001.com> References: <1846209.PLLzzlfL6S@laptop> <03bd6811-4d4d-8fb9-53be-0dd50065dec3@code2001.com> Message-ID: <73306699-7972-931b-462d-d19f4c871e70@it.aoyama.ac.jp> On 07/02/2021 07:50, James Kass via Unicode wrote: > Here's a couple of web page links: > > https://spanish.stackexchange.com/questions/32080/tiene-nombre-el-signo-de-la-cruz-en-un-c%c3%adrculo-que-veo-a-veces-en-el-dpd/32081#32081 > > > https://dle.rae.es/bolaspa > > Both show the symbol as a superscript, above the baseline.? The first > page cites an article which refers to the bolaspa as a symbol of > "medieval torture". And it's easy to figure out that they use U+2297 CIRCLED TIMES for this purpose, and seem to be happy with it. The entry for U+2297 also gives various alternatives: 2297 ? CIRCLED TIMES = tensor product = vector pointing into page ? 26D2 ? circled crossing lanes ? 2A02 ? n-ary circled times operator ? 2BBE ? circled x ? 2297 FE00 ? with white rim Not sure we need yet another character that looks almost the same. But maybe adding a comment such as = Spanish bolaspa might help. Regards, Martin. From doug at ewellic.org Sat Feb 6 18:05:32 2021 From: doug at ewellic.org (Doug Ewell) Date: Sat, 6 Feb 2021 17:05:32 -0700 Subject: No more RGI flag sequences Message-ID: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org> I spotted this recently, by chance, in L2/21-014, "Emoji Subcommittee Report Q1, 2021": "RGI Flag Criteria Update: Moving forward, proposals to add subdivision flags or continental regions as RGI will not be considered by the Emoji Subcommittee or the UTC." That means neither Northern Ireland, nor the states of the U.S. or Mexico or Germany, nor the provinces of Canada, nor the regions of Italy?many of whose inhabitants have as much pride in their homeland as the English, Scottish, and Welsh?will have their flag sequences tagged as RGI. (Continental regions, of course, do not have flags.) That basically means no vendor will support flag images for these places, and they will not be interchangeable in any medium that uses Unicode. Some of us find this unfortunate. -- Doug Ewell, CC, ALB | Thornton, CO, US ?????? | ewellic.org From copypaste at kittens.ph Sat Feb 6 18:14:38 2021 From: copypaste at kittens.ph (Fredrick Brennan) Date: Sat, 06 Feb 2021 19:14:38 -0500 Subject: Best character to use for the =?UTF-8?B?wqtib2xhc3Bhwrs=?= sign in Spanish In-Reply-To: <73306699-7972-931b-462d-d19f4c871e70@it.aoyama.ac.jp> References: <03bd6811-4d4d-8fb9-53be-0dd50065dec3@code2001.com> <73306699-7972-931b-462d-d19f4c871e70@it.aoyama.ac.jp> Message-ID: <3432254.lxTiPNA053@laptop> To argue the other side: The examples given seem to suggest that bolaspa, as opposed to ?, is always shown in superscript, in Unicode chart terms, something like: ? 2297 ? So, if this is indeed done consistently, perhaps that can be a basis for an argument in a proposal. Best, Fred Brennan On Saturday, February 6, 2021 6:57:39 PM EST Martin J. D?rst via Unicode wrote: > [I]t's easy to figure out that they use U+2297 CIRCLED TIMES for this > purpose, and seem to be happy with it. > > Not sure we need yet another character that looks almost the same. But > maybe adding a comment such as > = Spanish bolaspa > might help. > > Regards, Martin. From mark at macchiato.com Sat Feb 6 19:16:00 2021 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Sat, 6 Feb 2021 17:16:00 -0800 Subject: No more RGI flag sequences In-Reply-To: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org> References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org> Message-ID: The reasoning behind that has been at https://www.unicode.org/emoji/proposals.html#Flags (F2) for some time. The feasibility issues behind that reasoning would have to change substantially before FR could be revised. Note however that all the subdivision flags remain valid; just not recommended for general interchange. Mark On Sat, Feb 6, 2021 at 4:07 PM Doug Ewell via Unicode wrote: > I spotted this recently, by chance, in L2/21-014, "Emoji Subcommittee > Report Q1, 2021": > > "RGI Flag Criteria Update: Moving forward, proposals to add subdivision > flags or continental regions as RGI will not be considered by the Emoji > Subcommittee or the UTC." > > That means neither Northern Ireland, nor the states of the U.S. or Mexico > or Germany, nor the provinces of Canada, nor the regions of Italy?many of > whose inhabitants have as much pride in their homeland as the English, > Scottish, and Welsh?will have their flag sequences tagged as RGI. > (Continental regions, of course, do not have flags.) > > That basically means no vendor will support flag images for these places, > and they will not be interchangeable in any medium that uses Unicode. > > Some of us find this unfortunate. > > -- > Doug Ewell, CC, ALB | Thornton, CO, US ?????? | ewellic.org > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.paeper at crissov.de Sun Feb 7 08:11:37 2021 From: christoph.paeper at crissov.de (=?utf-8?Q?Christoph_P=C3=A4per?=) Date: Sun, 7 Feb 2021 15:11:37 +0100 Subject: No more RGI flag sequences In-Reply-To: References: Message-ID: Mark Davis ?? via Unicode: > > The reasoning behind that has been at https://www.unicode.org/emoji/proposals.html#Flags (F2) for some time. The feasibility issues behind that reasoning would have to change substantially before FR could be revised. The only ?continental region? with a well -established, copyleft flag is 002, Africa: ???? > Note however that all the subdivision flags remain valid; just not recommended for general interchange. ?just not? means everything here, chicken and egg. We should un-RGI emoji flags for dependent regions (like UM which often uses the same graphic as US). In conclusion, Whatsapp was right to use private user codes from ISO 3166-1, like XE for England, employing RIS, because there never was a realistic chance that we would get out of two-letter codes any time soon. -------------- next part -------------- An HTML attachment was scrubbed... URL: From otto.stolz at uni-konstanz.de Sun Feb 7 08:59:34 2021 From: otto.stolz at uni-konstanz.de (Otto Stolz) Date: Sun, 7 Feb 2021 15:59:34 +0100 Subject: No more RGI flag sequences In-Reply-To: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org> References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org> Message-ID: <58870b04-dc34-0b97-0428-366189626bdd@uni-konstanz.de> Hello, what, on earth (and in Unicode jargon), does ?RGI? mean? The obvious place to look for Unicode specific abbreviations is , but I cannot find it there. Experts, please amend the FAQ with all common Unicode acronyms. Best wishes, ? Otto Stolz From arthur at reutenauer.eu Sun Feb 7 09:29:50 2021 From: arthur at reutenauer.eu (Arthur Reutenauer) Date: Sun, 7 Feb 2021 16:29:50 +0100 Subject: No more RGI flag sequences In-Reply-To: <58870b04-dc34-0b97-0428-366189626bdd@uni-konstanz.de> References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org> <58870b04-dc34-0b97-0428-366189626bdd@uni-konstanz.de> Message-ID: <20210207152950.xksmihrlbmpmtruk@phare.normalesup.org> On Sun, Feb 07, 2021 at 03:59:34PM +0100, Otto Stolz via Unicode wrote: > what, on earth (and in Unicode jargon), does ?RGI? mean? ?Recommended for general interchange?. See https://www.unicode.org/reports/tr51/#def_RGI Arthur From harjitmoe at outlook.com Sun Feb 7 10:15:02 2021 From: harjitmoe at outlook.com (Harriet Riddle) Date: Sun, 7 Feb 2021 16:15:02 +0000 Subject: Fwd: No more RGI flag sequences In-Reply-To: References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org>, <58870b04-dc34-0b97-0428-366189626bdd@uni-konstanz.de>, Message-ID: Since I inadvertantly sent the below as Reply, as opposed to Reply All, I'm forwarding it to the list for record. ?Har. ________________________________ From: Harriet Riddle Sent: Sunday, 7 February 2021, 15:14 To: Otto Stolz Subject: Re: No more RGI flag sequences RGI means Recommended for General Interchange, i.e. an grapheme cluster that emoji fonts are expected to include a glyph for as standard, as opposed to other sequences with U+200D or U+FE0F (or even codepoints without emoji status which are not marked with U+FE0F, e.g. on Samsung devices) which specific emoji fonts might decide to include. ?Har. Get Outlook for Android ________________________________ From: Unicode on behalf of Otto Stolz via Unicode Sent: Sunday, February 7, 2021 2:59:34 PM To: unicode at unicode.org Subject: Re: No more RGI flag sequences Hello, what, on earth (and in Unicode jargon), does ?RGI? mean? The obvious place to look for Unicode specific abbreviations is , but I cannot find it there. Experts, please amend the FAQ with all common Unicode acronyms. Best wishes, Otto Stolz -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Sun Feb 7 12:29:51 2021 From: doug at ewellic.org (Doug Ewell) Date: Sun, 7 Feb 2021 11:29:51 -0700 Subject: No more RGI flag sequences In-Reply-To: References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org> Message-ID: <003601d6fd7f$39614c00$ac23e400$@ewellic.org> Mark Davis wrote: > The reasoning behind that has been at > https://www.unicode.org/emoji/proposals.html#Flags (F2) for some time. Since I'm unlikely ever to submit a true emoji proposal (i.e. for a facial expression or animal or hand gesture), I probably wouldn't have thought to look at the "Submitting Emoji Proposals" page. That said, this passage in that section: > Adding further subdivision flags as RGI can also appear to play > favorites unless similar subdivisions also get flags, which could mean > ?all other flags of that country? or ?all subdivisions of greater or > equal population in other countries? doesn't seem to align with the decision to exclude Northern Ireland. > The feasibility issues behind that reasoning would have to change > substantially before FR could be revised. > > Note however that all the subdivision flags remain valid; just not > recommended for general interchange. That basically means no vendor will support flag images for these places, and they will not be interchangeable in any medium that uses Unicode. -- Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org From joan at montane.cat Sun Feb 7 13:10:26 2021 From: joan at montane.cat (=?UTF-8?Q?Joan_Montan=C3=A9?=) Date: Sun, 7 Feb 2021 20:10:26 +0100 Subject: No more RGI flag sequences In-Reply-To: <003601d6fd7f$39614c00$ac23e400$@ewellic.org> References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org> <003601d6fd7f$39614c00$ac23e400$@ewellic.org> Message-ID: Missatge de Doug Ewell via Unicode del dia dg., 7 de febr. 2021 a les 19:34: > > Mark Davis wrote: > > > The reasoning behind that has been at > > https://www.unicode.org/emoji/proposals.html#Flags (F2) for some time. > > Since I'm unlikely ever to submit a true emoji proposal (i.e. for a facial expression or animal or hand gesture), I probably wouldn't have thought to look at the "Submitting Emoji Proposals" page. > > That said, this passage in that section: > > > Adding further subdivision flags as RGI can also appear to play > > favorites unless similar subdivisions also get flags, which could mean > > ?all other flags of that country? or ?all subdivisions of greater or > > equal population in other countries? > > doesn't seem to align with the decision to exclude Northern Ireland. > > > The feasibility issues behind that reasoning would have to change > > substantially before FR could be revised. > > > > Note however that all the subdivision flags remain valid; just not > > recommended for general interchange. > > That basically means no vendor will support flag images for these places, and they will not be interchangeable in any medium that uses Unicode. > So, Unicode creates a universal encoding mechanism to represent flags from subdivision ISO territories years ago, and Unicode throws the key to the bottom of the sea now. I can understand that it is hard to put a line for which subdivision territories merit RGI. But closing RGI to UK is really English-focused. Just my 2 ct. Joan Montan? From mark at macchiato.com Sun Feb 7 13:33:08 2021 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Sun, 7 Feb 2021 11:33:08 -0800 Subject: No more RGI flag sequences In-Reply-To: <003601d6fd7f$39614c00$ac23e400$@ewellic.org> References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org> <003601d6fd7f$39614c00$ac23e400$@ewellic.org> Message-ID: The main issue for making N. Ireland be RGI was the lack of an official flag. It is valid (???????). Mark On Sun, Feb 7, 2021 at 10:29 AM Doug Ewell wrote: > Mark Davis wrote: > > > The reasoning behind that has been at > > https://www.unicode.org/emoji/proposals.html#Flags (F2) for some time. > > Since I'm unlikely ever to submit a true emoji proposal (i.e. for a facial > expression or animal or hand gesture), I probably wouldn't have thought to > look at the "Submitting Emoji Proposals" page. > > That said, this passage in that section: > > > Adding further subdivision flags as RGI can also appear to play > > favorites unless similar subdivisions also get flags, which could mean > > ?all other flags of that country? or ?all subdivisions of greater or > > equal population in other countries? > > doesn't seem to align with the decision to exclude Northern Ireland. > > > The feasibility issues behind that reasoning would have to change > > substantially before FR could be revised. > > > > Note however that all the subdivision flags remain valid; just not > > recommended for general interchange. > > That basically means no vendor will support flag images for these places, > and they will not be interchangeable in any medium that uses Unicode. > > -- > Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Sun Feb 7 13:58:23 2021 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Sun, 7 Feb 2021 11:58:23 -0800 Subject: No more RGI flag sequences In-Reply-To: References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org> <003601d6fd7f$39614c00$ac23e400$@ewellic.org> Message-ID: Part of the reason for providing a general mechanism that would make all subdivision flags be valid was to provide an interchangeable way for some platforms to supply additional subdivision flags. Then evidence of popularity on those platforms could provide a strong signal for making a particular subdivision flag be RGI. As it turned out, (a) the frequency of usage of subdivision flags turned out to be quite low. (The category of flags in general is already not stellar: https://home.unicode.org/emoji/emoji-frequency/) (b) adding more subdivision flags turned out to be a long, slippery slope, and full of geopolitical landmines. Mark On Sun, Feb 7, 2021 at 11:10 AM Joan Montan? wrote: > Missatge de Doug Ewell via Unicode del dia dg., > 7 de febr. 2021 a les 19:34: > > > > Mark Davis wrote: > > > > > The reasoning behind that has been at > > > https://www.unicode.org/emoji/proposals.html#Flags (F2) for some time. > > > > Since I'm unlikely ever to submit a true emoji proposal (i.e. for a > facial expression or animal or hand gesture), I probably wouldn't have > thought to look at the "Submitting Emoji Proposals" page. > > > > That said, this passage in that section: > > > > > Adding further subdivision flags as RGI can also appear to play > > > favorites unless similar subdivisions also get flags, which could mean > > > ?all other flags of that country? or ?all subdivisions of greater or > > > equal population in other countries? > > > > doesn't seem to align with the decision to exclude Northern Ireland. > > > > > The feasibility issues behind that reasoning would have to change > > > substantially before FR could be revised. > > > > > > Note however that all the subdivision flags remain valid; just not > > > recommended for general interchange. > > > > That basically means no vendor will support flag images for these > places, and they will not be interchangeable in any medium that uses > Unicode. > > > > So, Unicode creates a universal encoding mechanism to represent flags > from subdivision ISO territories years ago, and Unicode throws the key > to the bottom of the sea now. > > I can understand that it is hard to put a line for which subdivision > territories merit RGI. But closing RGI to UK is really > English-focused. > > Just my 2 ct. > > Joan Montan? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From everson at evertype.com Sun Feb 7 14:19:19 2021 From: everson at evertype.com (Michael Everson) Date: Sun, 7 Feb 2021 20:19:19 +0000 Subject: No more RGI flag sequences In-Reply-To: References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org> <003601d6fd7f$39614c00$ac23e400$@ewellic.org> Message-ID: On 7 Feb 2021, at 19:33, Mark Davis ?? via Unicode wrote: > > The main issue for making N. Ireland be RGI was the lack of an official flag. It is valid (???????). This is the mistake that the Consortium made. There is a flag which is widely and publicly in use. What is ?official? or ?unofficial? about it is a decision taken without due consideration to the realities of the political settlement in Britain and Ireland. Who uses flags and why? Nationalists in the North may prefer to use the Irish tricolour ??. Unionists may wish to use the Union flag ??. Who cares? That?s for people who want to refer to a national flag. The fact of the matter is that the United Kingdom is composed of three countries and one province. And in reality, FOUR flags are used particularly in sport. "The Ulster Banner was carried by the Northern Ireland team in the Commonwealth Games. It is also regularly displayed by supporters of the Northern Ireland national football team and is displayed by FIFA as the flag of Northern Ireland.? https://en.wikipedia.org/wiki/Flag_of_Northern_Ireland The decision to refuse to include the Ulster Banner for Northern Ireland was a really dumb decision. No good was served by it. Instead of using common sense, ?the lack of an official flag? was used as an excuse. It doesn?t make the Consortium look good. Michael Everson From harjitmoe at outlook.com Sun Feb 7 15:11:04 2021 From: harjitmoe at outlook.com (Harriet Riddle) Date: Sun, 7 Feb 2021 21:11:04 +0000 Subject: No more RGI flag sequences In-Reply-To: References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org> <003601d6fd7f$39614c00$ac23e400$@ewellic.org> , Message-ID: Some rambling targeted at no?ne in particular? What should be RGI for flags is a bit confusing, even when subregions are not considered. For instance, UM is a country code for the United States Minor Outlying Islands. This has no permanent population and, as such, no flag (official or not) besides that of the United States. Hence, the inclusion of it is a largely pointless duplicate encoding of the flag of the United States. However, it is widely supported across vendors. Meanwhile, the subregional code iqar corresponds to Erbil Governorate, Erbil being the capital of Iraq's autonomous Kurdistan Region. If a flag emoji encoding can show the flag of a larger region in absence of a more specific flag (like with the UM example), then I'd deduce that the subregional code iqar may be a perfectly reasonable encoding for the Kurdish flag. So does Unicode *really* exclude the Kurdish flag, as some who would kick up a stink might claim? There is no clean yes or no answer, much as there is no clean answer for Northern Ireland. The code is valid, but if it's not RGI, will any vendor try to support it? given that besides some legacy kept around by Samsung, what's RGI might tend to determine what new emoji "exist"? All of that being said, I doubt vendors would want to *remove* the flag of e.g. Scotland, though, since that would send a message in itself. ? Har. ________________________________ From: Unicode on behalf of Michael Everson via Unicode Sent: Sunday, February 7, 2021 8:19:19 PM To: Unicode@ Subject: Re: No more RGI flag sequences On 7 Feb 2021, at 19:33, Mark Davis ?? via Unicode wrote: > > The main issue for making N. Ireland be RGI was the lack of an official flag. It is valid (???????). This is the mistake that the Consortium made. There is a flag which is widely and publicly in use. What is ?official? or ?unofficial? about it is a decision taken without due consideration to the realities of the political settlement in Britain and Ireland. Who uses flags and why? Nationalists in the North may prefer to use the Irish tricolour ??. Unionists may wish to use the Union flag ??. Who cares? That?s for people who want to refer to a national flag. The fact of the matter is that the United Kingdom is composed of three countries and one province. And in reality, FOUR flags are used particularly in sport. "The Ulster Banner was carried by the Northern Ireland team in the Commonwealth Games. It is also regularly displayed by supporters of the Northern Ireland national football team and is displayed by FIFA as the flag of Northern Ireland.? https://en.wikipedia.org/wiki/Flag_of_Northern_Ireland The decision to refuse to include the Ulster Banner for Northern Ireland was a really dumb decision. No good was served by it. Instead of using common sense, ?the lack of an official flag? was used as an excuse. It doesn?t make the Consortium look good. Michael Everson -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Sun Feb 7 16:53:00 2021 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Sun, 7 Feb 2021 14:53:00 -0800 Subject: No more RGI flag sequences In-Reply-To: References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org> <003601d6fd7f$39614c00$ac23e400$@ewellic.org> Message-ID: Part of this whole story is simply history. The flags were encoded well before we had developed the definition of RGI and of subdivision flags (2017, https://www.unicode.org/reports/tr51/tr51-12.html). On Sun, Feb 7, 2021 at 1:13 PM Harriet Riddle via Unicode < unicode at unicode.org> wrote: > Some rambling targeted at no?ne in particular? > > What should be RGI for flags is a bit confusing, even when subregions are > not considered. > > For instance, UM is a country code for the United States Minor Outlying > Islands. This has no permanent population and, as such, no flag (official > or not) besides that of the United States. Hence, the inclusion of it is a > largely pointless duplicate encoding of the flag of the United States. > However, it is widely supported across vendors. > > Meanwhile, the subregional code iqar corresponds to Erbil Governorate, > Erbil being the capital of Iraq's autonomous Kurdistan Region. If a flag > emoji encoding can show the flag of a larger region in absence of a more > specific flag (like with the UM example), then I'd deduce that the > subregional code iqar may be a perfectly reasonable encoding for the > Kurdish flag. > > So does Unicode *really* exclude the Kurdish flag, as some who would kick > up a stink might claim? There is no clean yes or no answer, much as there > is no clean answer for Northern Ireland. The code is valid, but if it's not > RGI, will any vendor try to support it? given that besides some legacy kept > around by Samsung, what's RGI might tend to determine what new emoji > "exist"? > > All of that being said, I doubt vendors would want to *remove* the flag of > e.g. Scotland, though, since that would send a message in itself. > > ? Har. > ------------------------------ > *From:* Unicode on behalf of Michael > Everson via Unicode > *Sent:* Sunday, February 7, 2021 8:19:19 PM > *To:* Unicode@ > *Subject:* Re: No more RGI flag sequences > > On 7 Feb 2021, at 19:33, Mark Davis ?? via Unicode > wrote: > > > > The main issue for making N. Ireland be RGI was the lack of an official > flag. It is valid (???????). > > This is the mistake that the Consortium made. There is a flag which is > widely and publicly in use. What is ?official? or ?unofficial? about it is > a decision taken without due consideration to the realities of the > political settlement in Britain and Ireland. > > Who uses flags and why? Nationalists in the North may prefer to use the > Irish tricolour ??. Unionists may wish to use the Union flag ??. Who > cares? That?s for people who want to refer to a national flag. The fact of > the matter is that the United Kingdom is composed of three countries and > one province. And in reality, FOUR flags are used particularly in sport. > > "The Ulster Banner was carried by the Northern Ireland team in the > Commonwealth Games. It is also regularly displayed by supporters of the > Northern Ireland national football team and is displayed by FIFA as the > flag of Northern Ireland.? > https://en.wikipedia.org/wiki/Flag_of_Northern_Ireland > > The decision to refuse to include the Ulster Banner for Northern Ireland > was a really dumb decision. No good was served by it. Instead of using > common sense, ?the lack of an official flag? was used as an excuse. It > doesn?t make the Consortium look good. > > Michael Everson > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From everson at evertype.com Sun Feb 7 17:43:12 2021 From: everson at evertype.com (Michael Everson) Date: Sun, 7 Feb 2021 23:43:12 +0000 Subject: No more RGI flag sequences In-Reply-To: References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org> <003601d6fd7f$39614c00$ac23e400$@ewellic.org> Message-ID: <565B759A-2DD8-4091-9A51-EEB328439BFD@evertype.com> The clean answer is ?use the de-facto Ulster Banner glyph until such time as an ?official? flag is adopted?. Instead we have one of the constituent parts of the UK treated differently from the others, which is not very satisfactory. > On 7 Feb 2021, at 22:53, Mark Davis ?? via Unicode wrote: > > There is no clean yes or no answer, much as there is no clean answer for Northern Ireland. From mark at macchiato.com Sun Feb 7 19:59:41 2021 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Sun, 7 Feb 2021 17:59:41 -0800 Subject: No more RGI flag sequences In-Reply-To: <565B759A-2DD8-4091-9A51-EEB328439BFD@evertype.com> References: <001c01d6fce4$f3f54990$dbdfdcb0$@ewellic.org> <003601d6fd7f$39614c00$ac23e400$@ewellic.org> <565B759A-2DD8-4091-9A51-EEB328439BFD@evertype.com> Message-ID: Flags of Northern Ireland have a complicated history. Given that the political parties in Northern Ireland have considered the issue and were unable to come to a conclusion, there is no agreement in Unicode that it should be RGI. It is a valid Emoji, so if any party wants to supply a font with a glyph design of their choice, they are free to do so. Just as any party can supply a font with Phaistos disk symbols or other Unicode characters. Mark On Sun, Feb 7, 2021 at 3:44 PM Michael Everson via Unicode < unicode at unicode.org> wrote: > The clean answer is ?use the de-facto Ulster Banner glyph until such time > as an ?official? flag is adopted?. Instead we have one of the constituent > parts of the UK treated differently from the others, which is not very > satisfactory. > > > On 7 Feb 2021, at 22:53, Mark Davis ?? via Unicode > wrote: > > > > There is no clean yes or no answer, much as there is no clean answer for > Northern Ireland. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From christoph.paeper at crissov.de Mon Feb 8 02:34:36 2021 From: christoph.paeper at crissov.de (=?utf-8?Q?Christoph_P=C3=A4per?=) Date: Mon, 8 Feb 2021 09:34:36 +0100 Subject: No more RGI flag sequences In-Reply-To: References: Message-ID: Mark Davis ?? via Unicode: > > Flags of Northern Ireland have a complicated history. Given that the political parties in Northern Ireland have considered the issue and were unable to come to a conclusion, there is no agreement in Unicode that it should be RGI. For all other (sub-)national flag emojis, Unicode does not suggest a particular design, not raus its existence for the RGI label. Vendors would be free to display the one of TW the same as CN, for instance, but they decided to not show it at all in devices for the mainland Chinese market. There are really just two reasonable options: 1. Treat GBNIR the same as GBENG, GBSCT and GBWLS, i.e. either recommend its flag emoji for general interchange or deprecate them all. 2. Unrecommend all RIS emoji flags that similarly have no clearly defined design distinct from their parent region, e.g. UM or simply all that are marked as ?dependent? (which is equivalent to ?not independent?) in ISO 3166. > It is a valid Emoji, so if any party wants to supply a font with a glyph design of their choice, they are free to do so. Just as any party can supply a font with Phaistos disk symbols or other Unicode characters. That is a non-sequitur comparison. It?s more like all font vendors only supporting combining diacritics for roman letters if the combination also exists as a precomposed character ? and maintainers of open source fonts rejecting any contributions for other combinations, stating the fact that Unicode does not actively require support for them as the reason. (See several issues and PRs in the Github repositories of Noto Color Emoji and Twemoji.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Mon Feb 8 04:19:53 2021 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Mon, 8 Feb 2021 10:19:53 +0000 (GMT) Subject: Stickers Message-ID: <1fbb0ea4.e9.1778127a034.Webtop.229@btinternet.com> In https://www.unicode.org/emoji/proposals.html there is the following, near the start. > For proposals that may not have all the information required we > encourage you to use other mechanisms such as stickers, gifs, etc. to > share with the world. What exactly is a sticker please? For example, if someone produces and publishes an OpenType font with a colourful glyph that is not mapped to a Unicode code point and the publisher declares that the glyph can become displayed by an application program by entering the sequence %9217 whereupon glyph substitution will take place, substituting the colourful glyph for the five glyphs of the sequence, provided that the application program has the ability to act upon the liga table that is in the font and ligature substitution is switched on, is that a sticker in Unicode parlance? Or is it something else, and if so, what is it please? If people start using such sequences for glyphs then the result could be as potentially ambiguous as using Private Use Area encodings. However, I remember the way that new groups were added to the Usenet alt hierarchy, using a process of making a proposal in the alt.config group and discussion taking place for around a week and then starting of the new group then usually proceeding. This avoided name clashes, helped structure and was a generally helpful process. So these days, a mailing list or a wiki could be used for an informal, non-obligatory, helpful forum for such folk encoding so as to try to avoid duplication of sequences and possibly to try to keep some sort of structure. Encoding of glyphs in regular Unicode is good, yet for glyphs that do not get encoded this could be a useful technique. William Overington Monday 8 February 2021 From doug at ewellic.org Mon Feb 8 12:13:45 2021 From: doug at ewellic.org (Doug Ewell) Date: Mon, 8 Feb 2021 11:13:45 -0700 Subject: No more RGI flag sequences In-Reply-To: References: Message-ID: <000001d6fe46$24117ce0$6c3476a0$@ewellic.org> Christoph P?per wrote: > For all other (sub-)national flag emojis, Unicode does not suggest a > particular design, not raus its existence for the RGI label. Vendors > would be free to display the one of TW the same as CN, for instance, > but they decided to not show it at all in devices for the mainland > Chinese market. I agree with Christoph here. Having two (or more) flag designs to choose from when rendering a flag image is a matter of glyph design, just like having single-story and double-story glyph variants of the letters 'a' and 'g'. And no vendor or font designer is ever required to include glyphs for every Unicode character or sequence, "recommended" or not. -- Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org From kent.b.karlsson at bahnhof.se Mon Feb 8 14:48:24 2021 From: kent.b.karlsson at bahnhof.se (Kent Karlsson) Date: Mon, 8 Feb 2021 21:48:24 +0100 Subject: No more RGI flag sequences In-Reply-To: <000001d6fe46$24117ce0$6c3476a0$@ewellic.org> References: <000001d6fe46$24117ce0$6c3476a0$@ewellic.org> Message-ID: > 8 feb. 2021 kl. 19:13 skrev Doug Ewell via Unicode : > > Christoph P?per wrote: > >> For all other (sub-)national flag emojis, Unicode does not suggest a >> particular design, not raus its existence for the RGI label. Vendors >> would be free to display the one of TW the same as CN, for instance, >> but they decided to not show it at all in devices for the mainland >> Chinese market. > > I agree with Christoph here. Having two (or more) flag designs to choose from when rendering a flag image is a matter of glyph design, just like having single-story and double-story glyph variants of the letters 'a' and 'g?. Hmm, while those examples have a high degree of ?free variation? between ?single-story? and ?double-story? (as well as other variations), the situation with flags is not quite the same. While one in may ?freely? vary between ?flat? and ?faking wavy? designs for flag glyphs, and even distort some proportions (but not too much), many other differences are either wrong or time dependent. And that is a major flaw in the denotation design for flags in Unicode. But sometimes nations change flags, sometimes a little bit, sometimes radically. And then ?we? are in trouble. I would say changing glyphs from the design used in one era to another used in another era (for the same, or nearly the same territory) would be the same as changing character identity for a ?normal? coded character, like changing the code for A to suddenly be displayed as a B. Even though different flags may ?denote? the same territory over time, using the wrong one in a ?timed? document would be an error. In some cases one or the other design may even be offensive (which has obviously happened historically, and even now in some places). Of course one can use images for flags, assuming the document format allows for embedding images, rather than Unicode denotations for flags. That would solve such problems, but now we are discussing the Unicode way of denoting flags. > And no vendor or font designer is ever required to include glyphs for every Unicode character or sequence, "recommended" or not. Agree with that (in principle?). /Kent K > > -- > Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org > > > From christoph.paeper at crissov.de Tue Feb 9 02:18:14 2021 From: christoph.paeper at crissov.de (=?utf-8?Q?Christoph_P=C3=A4per?=) Date: Tue, 9 Feb 2021 09:18:14 +0100 Subject: Stickers In-Reply-To: <1fbb0ea4.e9.1778127a034.Webtop.229@btinternet.com> References: <1fbb0ea4.e9.1778127a034.Webtop.229@btinternet.com> Message-ID: <11BEE956-D529-45A4-8263-B202705577E8@crissov.de> William_J_G Overington via Unicode : > > What exactly is a sticker please? A sticker in this context is only used within instant messaging (IM) platforms and apps. It is an vector drawing (mostlySVG) or raster image (PNG or WebP or, rarely, JPEG) that is often part of a larger set of visually or thematically related graphics. Those might be, but usually are not, packaged inside a font format file. Each sticker exemplifies an emotion, reaction or concept, which may be associated with one or more Unicode emojis as a kind of tag or keyword to facilitate simple search or suggestions for substitution. Stickers are considered more personalized than standardized emoji, because, at least in principle, each user could design and share their own. Apple Memoji and similar solutions by other vendors can be considered personalized dynamic sticker (and avatar) generators, in this case sharing graphic base models with the emoji font used by that vendor. Gifs are short animated image sequences or video clips without audio track that historically used Compuserve?s 8-bit graphics interchange file format (GIF89a), but nowadays APNG, WebM/MKV+VPx or MP4/H.26x. A gif is usually not part of a set, but it is often shared through public services. They are used to visualize emotions or to emphasize reactions and sometimes purely for decoration. Memes may be gifs, but are more often still images (mostly comics, photographs or captured video frames, often in JPEG format), frequently including textual overlays. They reference short-lived phenomena from popular culture to exemplify reactions. Based on a single consistent visual component, memes usually have countless variations, but otherwise are not part of a set. What makes all of these related to emojis is that they visually augment or even replace written performative acts in a primarily text-based communication medium, often in 1:1 or m:n scenarios like chat and social media and less often in 1:n prose. From richard.wordingham at ntlworld.com Tue Feb 16 23:17:41 2021 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 17 Feb 2021 05:17:41 +0000 Subject: Zawgyi Tonemarks in Latin Script Message-ID: <20210217051741.47de04dd@JRWUBU2> I've been cleaning up some mojibake and I'm stumped on how to clean up what are intended to be U+1037 MYANMAR SIGN DOT BELOW (the original text hijacked U+1095 as best fitting U+1E45 LATIN SMALL LETTER N WITH DOT ABOVE) and U+1038 MYANMAR SIGN VISARGA in Burmese text transliterated to the Roman script. While U+0325 COMBINING RING BELOW works for the first sign, what should I use for the second sign? I want to preserve the Romanisation, not retransliterate. I presume the Zawgyi-encoded pseudo-Unicode worked fine in the original Zawgyi-attuned rendering system. If I apply the Myanamar script signs to the Latin letters, the renderer punishes me with dotted circles. Richard. From jameskass at code2001.com Tue Feb 16 23:40:54 2021 From: jameskass at code2001.com (James Kass) Date: Wed, 17 Feb 2021 05:40:54 +0000 Subject: Zawgyi Tonemarks in Latin Script In-Reply-To: <20210217051741.47de04dd@JRWUBU2> References: <20210217051741.47de04dd@JRWUBU2> Message-ID: On 2021-02-17 5:17 AM, Richard Wordingham via Unicode wrote: > If I apply the Myanamar script signs to the Latin letters, the renderer > punishes me with dotted circles. > > Richard. Unable to repro this here.? The string "k?" does not display with the dotted circle.? Tried this on Windows 7 with both BabelPad and LibreOffice.? (And now in the compose panel of Mozilla Thunderbird.) Maybe file a bug with the renderer developer? From abrahamgross at disroot.org Wed Feb 17 09:44:08 2021 From: abrahamgross at disroot.org (abrahamgross at disroot.org) Date: Wed, 17 Feb 2021 15:44:08 +0000 (UTC) Subject: Zawgyi Tonemarks in Latin Script In-Reply-To: References: <20210217051741.47de04dd@JRWUBU2> Message-ID: <341e16d6-f589-4226-9c9f-1f2e7cafcee4@disroot.org> I see the? dotted circle circle on android From markus.icu at gmail.com Wed Feb 17 10:51:35 2021 From: markus.icu at gmail.com (Markus Scherer) Date: Wed, 17 Feb 2021 08:51:35 -0800 Subject: Zawgyi Tonemarks in Latin Script In-Reply-To: References: <20210217051741.47de04dd@JRWUBU2> Message-ID: On Tue, Feb 16, 2021 at 9:43 PM James Kass via Unicode wrote: > Unable to repro this here. The string "k?" does not display with the > dotted circle. Dotted circle on a Chromebook (displayed in Gmail). markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Wed Feb 17 11:17:35 2021 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Wed, 17 Feb 2021 09:17:35 -0800 Subject: Zawgyi Tonemarks in Latin Script In-Reply-To: References: <20210217051741.47de04dd@JRWUBU2> Message-ID: An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Wed Feb 17 11:43:35 2021 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 17 Feb 2021 17:43:35 +0000 Subject: Zawgyi Tonemarks in Latin Script In-Reply-To: References: <20210217051741.47de04dd@JRWUBU2> Message-ID: <20210217174335.719eaadf@JRWUBU2> On Wed, 17 Feb 2021 05:40:54 +0000 James Kass via Unicode wrote: > Unable to repro this here.? The string "k?" does not display with the > dotted circle.? Tried this on Windows 7 with both BabelPad and > LibreOffice.? (And now in the compose panel of Mozilla Thunderbird.) That is curious. Which font were you using? In Word on Windows 10, using the font Myanmar text for the whole string, in LibreOffice and Firefox on Ubuntu 16.04 (so at least one of them falls back to HarfBuzz Version 1.2.7), and with the Padauk font using HarfBuzz Version 2.7.2, I get a dotted circle even for an ASCII letter plus U+1038 MYANMAR SIGN VISARGA. Of course, there's no problem with HarfBuzz if one uses the Zawgyi-One font, which is one of the few to support the sequence . > Maybe file a bug with the renderer developer? They could argue that it's not the sort of sequence that they will support. (Am I right in thinking that a Unicode-compliant renderer may deliberately misrender unsupported sequences?) Unfortunately, the Unicode technical annexes support the principle of separating a base character from its marks when the extended script property doesn't support their combination. (I've already complained to Mark Davis about this.) After all, if you want a candrabindu on the Latin letter 'l', or 'v', or 'y', you use U+0310 COMBINING CANDRABINDU. Richard. From jameskass at code2001.com Wed Feb 17 12:01:22 2021 From: jameskass at code2001.com (James Kass) Date: Wed, 17 Feb 2021 18:01:22 +0000 Subject: Zawgyi Tonemarks in Latin Script In-Reply-To: <20210217174335.719eaadf@JRWUBU2> References: <20210217051741.47de04dd@JRWUBU2> <20210217174335.719eaadf@JRWUBU2> Message-ID: <16222a2c-872e-a09a-49dc-0bca99fea32d@code2001.com> On 2021-02-17 5:43 PM, Richard Wordingham via Unicode wrote: > Which font were you using? Tried both Code2000 and Myanmar1 [Myanmar1:Version 0.55 from Myanmar NLP].? Both fonts have dotted circle glyphs properly mapped. One possible workaround for font developers who aren't especially fond of the dotted circles might be to map a zero-width no contour glyph to the DOTTED CIRLE character, although I haven't tried this. From jukkakk at gmail.com Wed Feb 17 13:48:38 2021 From: jukkakk at gmail.com (Jukka K. Korpela) Date: Wed, 17 Feb 2021 21:48:38 +0200 Subject: Zawgyi Tonemarks in Latin Script In-Reply-To: References: <20210217051741.47de04dd@JRWUBU2> Message-ID: James Kass via Unicode (unicode at unicode.org) kirjoitti: > > Unable to repro this here. The string "k?" does not display with the > dotted circle. Tried this on Windows 7 with both BabelPad and > LibreOffice. (And now in the compose panel of Mozilla Thunderbird.) This seems to depend on the font. On Win 10. I get a rendering with a dotted circle between the Latin letter and the mark when using the Myanmar Text font, but without it when using the Code2000 font. This happens e.g. in Word 365 and in BabelPad, and even in NotePad. Perhaps more surprisingly, the Google font Padauk, when tested via https://fonts.google.com/specimen/Padauk?subset=myanmar&preview.text=k%E1%80%B8&preview.text_type=custom (tested on Chrome) shows the dotted circle when using regular (weight 400) font but not when using bold (700) font. I don?t quite understand the original problem. If you Romanize text, why would you use marks of the original script? I think Romanization schemes typically map marks to some combining marks commonly used for Latin letters or some punctuation or special characters. Jukka > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Wed Feb 17 13:52:41 2021 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 17 Feb 2021 19:52:41 +0000 Subject: Zawgyi Tonemarks in Latin Script In-Reply-To: <16222a2c-872e-a09a-49dc-0bca99fea32d@code2001.com> References: <20210217051741.47de04dd@JRWUBU2> <20210217174335.719eaadf@JRWUBU2> <16222a2c-872e-a09a-49dc-0bca99fea32d@code2001.com> Message-ID: <20210217195241.14941fa7@JRWUBU2> On Wed, 17 Feb 2021 18:01:22 +0000 James Kass via Unicode wrote: > On 2021-02-17 5:43 PM, Richard Wordingham via Unicode wrote: > > Which font were you using? > > Tried both Code2000 and Myanmar1 [Myanmar1:Version 0.55 from Myanmar > NLP].? Both fonts have dotted circle glyphs properly mapped. I can confirm that behaviour for HarfBuzz Version 2.7.4. However, I used VersionS 1.15 and 1.171 of Code2000, which have the invalid script tag "myan" instead of "mymr" or "mym2", and for which Indic rearrangement and subscript consonant formation do not occur either. Using Myanmar1 Version 0.55 with HarfBuzz Version 2.7.4 achieves rearrangement and subscript formation, and does not not insert the dotted circle even for defective sequences starting with a non-spacing mark. It has glyph substitution lookups for the script "mymr" only (not even for the default script). The Padauk font I used has lookups for both "mymr" and "mym2". That might be significant; the former would have been ignored in favour of the latter. Richard. From vinodh.vinodh at gmail.com Wed Feb 17 15:12:37 2021 From: vinodh.vinodh at gmail.com (Vinodh Rajan) Date: Wed, 17 Feb 2021 22:12:37 +0100 Subject: Zawgyi Tonemarks in Latin Script In-Reply-To: <20210217174335.719eaadf@JRWUBU2> References: <20210217051741.47de04dd@JRWUBU2> <20210217174335.719eaadf@JRWUBU2> Message-ID: > > If you Romanize text, why would you use marks of the original script? I > think Romanization schemes typically map marks to some combining > marks commonly used for Latin letters or some punctuation or special > characters. > I was composing a document literally yesterday, which required me to do this. [image: image.png] I had to choose a font that has does not contain the dotted circle to circumvent the rendering engines. Thai has three viramas (sort of) and it makes sense to use the original marks in the romanization to retain the differentiation. I can of course invent three new diacritic marks that work with Latin letters. But it is a one-off thing, It doesn't make sense to include a note explaining my ad-hoc conventions just for that one word. It's just too laborious. Vinodh On Wed, Feb 17, 2021 at 6:45 PM Richard Wordingham via Unicode < unicode at unicode.org> wrote: > On Wed, 17 Feb 2021 05:40:54 +0000 > James Kass via Unicode wrote: > > > Unable to repro this here. The string "k?" does not display with the > > dotted circle. Tried this on Windows 7 with both BabelPad and > > LibreOffice. (And now in the compose panel of Mozilla Thunderbird.) > > That is curious. Which font were you using? > > In Word on Windows 10, using the font Myanmar text for the whole > string, in LibreOffice and Firefox on Ubuntu 16.04 (so at least one of > them falls back to HarfBuzz Version 1.2.7), and with the Padauk font > using HarfBuzz Version 2.7.2, I get a dotted circle even for an ASCII > letter plus U+1038 MYANMAR SIGN VISARGA. > > Of course, there's no problem with HarfBuzz if one uses the Zawgyi-One > font, which is one of the few to support the sequence U+1038>. > > > Maybe file a bug with the renderer developer? > > They could argue that it's not the sort of sequence that they will > support. (Am I right in thinking that a Unicode-compliant renderer may > deliberately misrender unsupported sequences?) Unfortunately, the > Unicode technical annexes support the principle of separating a base > character from its marks when the extended script property doesn't > support their combination. (I've already complained to Mark Davis about > this.) After all, if you want a candrabindu on the Latin letter 'l', > or 'v', or 'y', you use U+0310 COMBINING CANDRABINDU. > > Richard. > > -- http://www.virtualvinodh.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 103982 bytes Desc: not available URL: From richard.wordingham at ntlworld.com Wed Feb 17 16:31:01 2021 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 17 Feb 2021 22:31:01 +0000 Subject: Zawgyi Tonemarks in Latin Script In-Reply-To: References: <20210217051741.47de04dd@JRWUBU2> Message-ID: <20210217223101.4069ad6b@JRWUBU2> On Wed, 17 Feb 2021 21:48:38 +0200 "Jukka K. Korpela via Unicode" wrote: > I don?t quite understand the original problem. If you Romanize text, > why would you use marks of the original script? I think Romanization > schemes typically map marks to some combining marks commonly used for > Latin letters or some punctuation or special characters. It tends to happen when there isn't an obvious transliteration, or the scheme just doesn't match. For example, it is not uncommon to find Sanskrit in the Roman script using danda and double danda as punctuation. The consonant nasalisation mark, candrabindu, has been borrowed for writing Sanskrit in the Roman script, which is why we have U+0310 COMBINING CANDRABINDU. I have seen this 'Latin' candrabindu in print outside Sanskrit text books, I think in the journal 'Word'. I couldn't find many examples on-line, but one can be found in the Pali Text Society 2019 publication "The Catalogue of Manuscript in the U Pho Thi Library, Thaton, Myanmar" (ISBN-13 9780 86013 081 9) - an extract is accessible at The quotation I was cleaning up is a quotation of one of the authors of that catalogue. There is some vacillation between using the Burmese marks and a full stop and colon, but the Roman punctuation marks are avoided when they might be misinterpreted as punctuation. Richard. From jameskass at code2001.com Wed Feb 17 19:49:59 2021 From: jameskass at code2001.com (James Kass) Date: Thu, 18 Feb 2021 01:49:59 +0000 Subject: Zawgyi Tonemarks in Latin Script In-Reply-To: <20210217195241.14941fa7@JRWUBU2> References: <20210217051741.47de04dd@JRWUBU2> <20210217174335.719eaadf@JRWUBU2> <16222a2c-872e-a09a-49dc-0bca99fea32d@code2001.com> <20210217195241.14941fa7@JRWUBU2> Message-ID: <18c23f55-ca28-c3c1-2809-520faa604861@code2001.com> On 2021-02-17 7:52 PM, Richard Wordingham via Unicode wrote: > I can confirm that behaviour for HarfBuzz Version 2.7.4. However, I > used VersionS 1.15 and 1.171 of Code2000, which have the invalid script > tag "myan" instead of "mymr" or "mym2", and for which Indic > rearrangement and subscript consonant formation do not occur either. I used Code2000 Version 1.172, but it also has the older script tag "myan".? Font Validator shows the "myan" tag as valid, but thanks to your pointer I checked the OpenType specs and will be changing the tag to "mymr" for the next release. From Andrew.Glass at microsoft.com Wed Feb 17 21:12:02 2021 From: Andrew.Glass at microsoft.com (Andrew Glass) Date: Thu, 18 Feb 2021 03:12:02 +0000 Subject: [EXTERNAL] Re: Zawgyi Tonemarks in Latin Script In-Reply-To: <18c23f55-ca28-c3c1-2809-520faa604861@code2001.com> References: <20210217051741.47de04dd@JRWUBU2> <20210217174335.719eaadf@JRWUBU2> <16222a2c-872e-a09a-49dc-0bca99fea32d@code2001.com> <20210217195241.14941fa7@JRWUBU2> <18c23f55-ca28-c3c1-2809-520faa604861@code2001.com> Message-ID: Hi James, If you want the glyph rearrangement to occur, please use the mym2 tag. The mymr tag is a legacy tag for pre-shaping Myanmar Unicode fonts such as Myanmar 3 which do their own reordering. Cheers, Andrew -----Original Message----- From: Unicode On Behalf Of James Kass via Unicode Sent: 17 February 2021 17:50 To: unicode at unicode.org Subject: [EXTERNAL] Re: Zawgyi Tonemarks in Latin Script On 2021-02-17 7:52 PM, Richard Wordingham via Unicode wrote: > I can confirm that behaviour for HarfBuzz Version 2.7.4. However, I > used VersionS 1.15 and 1.171 of Code2000, which have the invalid > script tag "myan" instead of "mymr" or "mym2", and for which Indic > rearrangement and subscript consonant formation do not occur either. I used Code2000 Version 1.172, but it also has the older script tag "myan".? Font Validator shows the "myan" tag as valid, but thanks to your pointer I checked the OpenType specs and will be changing the tag to "mymr" for the next release. From richard.wordingham at ntlworld.com Thu Feb 18 03:04:48 2021 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Thu, 18 Feb 2021 09:04:48 +0000 Subject: Subscript Manual WA (was: Zawgyi Tonemarks in Latin Script) In-Reply-To: References: <20210217051741.47de04dd@JRWUBU2> <20210217174335.719eaadf@JRWUBU2> <16222a2c-872e-a09a-49dc-0bca99fea32d@code2001.com> <20210217195241.14941fa7@JRWUBU2> <18c23f55-ca28-c3c1-2809-520faa604861@code2001.com> Message-ID: <20210218090448.59bc2324@JRWUBU2> On Thu, 18 Feb 2021 03:12:02 +0000 Andrew Glass via Unicode wrote: > If you want the glyph rearrangement to occur, please use the mym2 > tag. The mymr tag is a legacy tag for pre-shaping Myanmar Unicode > fonts such as Myanmar 3 which do their own reordering. Would mymr be the suitable tag for fonts supporting legacy languages such as Sanskrit? Syllable-initial subscript WA seems not to be supported by the modern system; one has to approximate it by U+103D MYANMAR CONSONANT SIGN MEDIAL WA. Richard. From Andrew.Glass at microsoft.com Thu Feb 18 13:36:37 2021 From: Andrew.Glass at microsoft.com (Andrew Glass) Date: Thu, 18 Feb 2021 19:36:37 +0000 Subject: [EXTERNAL] Subscript Manual WA (was: Zawgyi Tonemarks in Latin Script) In-Reply-To: <20210218090448.59bc2324@JRWUBU2> References: <20210217051741.47de04dd@JRWUBU2> <20210217174335.719eaadf@JRWUBU2> <16222a2c-872e-a09a-49dc-0bca99fea32d@code2001.com> <20210217195241.14941fa7@JRWUBU2> <18c23f55-ca28-c3c1-2809-520faa604861@code2001.com> <20210218090448.59bc2324@JRWUBU2> Message-ID: Great question Richard, can you provide some examples? Do we have an agreed encoding mechanism for this? In principle, I would not recommend using mymr because the reordering requirements for Myanmar are complex and it would be inefficient for a font to try and handle them. That said, all kinds of things are possible with OpenType, so it may be a practical workaround in the short term. However, if we can understand the requirement, updating the Myanmar cluster validation and reordering logic to support would be the preferred option here - if it isn't already possible. Cheers, Andrew -----Original Message----- From: Unicode On Behalf Of Richard Wordingham via Unicode Sent: 18 February 2021 01:05 To: unicode at unicode.org Subject: [EXTERNAL] Subscript Manual WA (was: Zawgyi Tonemarks in Latin Script) On Thu, 18 Feb 2021 03:12:02 +0000 Andrew Glass via Unicode wrote: > If you want the glyph rearrangement to occur, please use the mym2 tag. > The mymr tag is a legacy tag for pre-shaping Myanmar Unicode fonts > such as Myanmar 3 which do their own reordering. Would mymr be the suitable tag for fonts supporting legacy languages such as Sanskrit? Syllable-initial subscript WA seems not to be supported by the modern system; one has to approximate it by U+103D MYANMAR CONSONANT SIGN MEDIAL WA. Richard. From richard.wordingham at ntlworld.com Thu Feb 18 18:16:20 2021 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 19 Feb 2021 00:16:20 +0000 Subject: [EXTERNAL] Subscript Manual WA (was: Zawgyi Tonemarks in Latin Script) In-Reply-To: References: <20210217051741.47de04dd@JRWUBU2> <20210217174335.719eaadf@JRWUBU2> <16222a2c-872e-a09a-49dc-0bca99fea32d@code2001.com> <20210217195241.14941fa7@JRWUBU2> <18c23f55-ca28-c3c1-2809-520faa604861@code2001.com> <20210218090448.59bc2324@JRWUBU2> Message-ID: <20210219001620.7f0dd007@JRWUBU2> On Thu, 18 Feb 2021 19:36:37 +0000 Andrew Glass via Unicode wrote: The lack isn't where I thought it was - it turns out that the shaper specification already supports the non-medial subscript WA! I tweaked the OpenType lookup in Padauk to to generate the ?lyph for to check where the problem lay, but didn't realise that the HarfBuzz test program hb-view would by default use the Graphite shaping! When I selected the OpenType renderin?, I got the correct rendering from the tweaked font. The problem is that *fonts* seem not to be including the subscript WA, because it isn't required for *Modern Burmese*. It so happens that the major fonts' rendering of MEDIAL WA is suitable for - the pain of overlapping glyph ranges! > Great question Richard, can you provide some examples? Do we have an > agreed encoding mechanism for this? I'll give a detailed answer, though the renderers already have the solution. The need for a distinction was put forward by Michael Everson at al. in at least the following: L2/06-029 L2/06-077 p2 (a.k.a. WG2 N3043) L2/06-213 L2/06-077 p3 states, "Note that kwa with MEDIAL WA may take a teardrop or triangular WA shape, which is never the case with true subjoined WA (which is rare, though it occurs in Sanskrit)." Martin Hosken put forward other arguments, but I'm not sure that they were found convincing. As to examples, just look at the absolutives in https://www.alamy.com/stock-photo-burmese-writing-pali-canon-buddhist-canon-tripitaka-library-of-stone-21244784.html . I'd been goin? to say look for -itva? for both Pali and Sanskrit, but this forms seems commoner in word lists than actual text. TUS 13.0 Section 16.3 p647 says, "In Pali and Sanskrit texts written in the Myanmar script, as well as in older orthographies of Burmese, the consonants ya, ra, wa, and ha are sometimes rendered in subjoined form. In those cases, U+1039 ? myanmar sign virama and the regular form of the consonant are used." Thus, examples abound, and the encoding is defined. The codechart currently shows a teardrop shape for U+103D MYANMAR CONSONANT SIGN MEDIAL WA - that would not be suitable for . Richard. From markus.icu at gmail.com Thu Feb 18 19:22:25 2021 From: markus.icu at gmail.com (Markus Scherer) Date: Thu, 18 Feb 2021 17:22:25 -0800 Subject: new ISO 15924 script codes 2021q1 Message-ID: Dear Unicoders, FYI There are seven new script codes registered last month and this month: https://www.unicode.org/iso15924/codechanges.html CodeN?English NameNom fran?aisAlias Age DateKey Tnsa 275 Tangsa tangsa 2021-02-17 Add Vith 228 Vithkuqi vithkuqi 2021-02-17 Add Ougr 143 Old Uyghur ancien ou?gour 2021-01-25 Add Pcun 015 Proto-Cuneiform proto-cun?iforme 2021-01-25 Add Pelm 016 Proto-Elamite proto-?lamite 2021-01-25 Add Psin 103 Proto-Sinaitic proto-sina?tique 2021-01-25 Add Ranj 303 Ranjana ranjana 2021-01-25 Add Best regards, markus ISO 15924 script code registrar -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Fri Feb 19 03:39:46 2021 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Fri, 19 Feb 2021 09:39:46 +0000 (GMT) Subject: ISO 15924 codes for unwritten documents and for inherited script (from Re: new ISO 15924 script codes 2021q1) In-Reply-To: References: Message-ID: <7acc5570.c4.177b9a8d845.Webtop.73@btinternet.com> Hi Thank you for posting. I looked through the linked list. Could you say where, how and why the following codes would be used please? >> Zxxx 997 Code for unwritten documents codet pour les documents non >> ?crites >> Zinh 994 Code for inherited script codet pour ?criture h?rit?e Best regards, William Overington Friday 19 February 2021 ------ Original Message ------ From: "Markus Scherer via Unicode" To: "Unicode Mailing List" Sent: Friday, 2021 Feb 19 At 01:22 Subject: new ISO 15924 script codes 2021q1 Dear Unicoders, FYI There are seven new script codes registered last month and this month: https://www.unicode.org/iso15924/codechanges.html CodeN?English NameNom fran?aisAlias AgeDateKeyTnsa275Tangsatangsa2021-02-17AddVith228Vithkuqivithkuqi2021-02-17AddOugr143Old Uyghurancien ou?gour2021-01-25AddPcun015Proto-Cuneiformproto-cun?iforme2021-01-25AddPelm016Proto-Elamiteproto-?lamite2021-01-25AddPsin103Proto-Sinaiticproto-sina?tique2021-01-25AddRanj303Ranjanaranjana2021-01-25Add Best regards, markus ISO 15924 script code registrar -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Fri Feb 19 10:32:23 2021 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Fri, 19 Feb 2021 08:32:23 -0800 Subject: ISO 15924 codes for unwritten documents and for inherited script (from Re: new ISO 15924 script codes 2021q1) In-Reply-To: <7acc5570.c4.177b9a8d845.Webtop.73@btinternet.com> References: <7acc5570.c4.177b9a8d845.Webtop.73@btinternet.com> Message-ID: <866cf06c-af9f-68b0-c0af-ef2fda875b9b@ix.netcom.com> An HTML attachment was scrubbed... URL: From Andrew.Glass at microsoft.com Fri Feb 19 16:49:12 2021 From: Andrew.Glass at microsoft.com (Andrew Glass) Date: Fri, 19 Feb 2021 22:49:12 +0000 Subject: [EXTERNAL] Subscript Manual WA (was: Zawgyi Tonemarks in Latin Script) In-Reply-To: <20210219001620.7f0dd007@JRWUBU2> References: <20210217051741.47de04dd@JRWUBU2> <20210217174335.719eaadf@JRWUBU2> <16222a2c-872e-a09a-49dc-0bca99fea32d@code2001.com> <20210217195241.14941fa7@JRWUBU2> <18c23f55-ca28-c3c1-2809-520faa604861@code2001.com> <20210218090448.59bc2324@JRWUBU2> <20210219001620.7f0dd007@JRWUBU2> Message-ID: Thank you for the nice examples, Richard. Indeed this is up to fonts to enable. Fonts could add a locl feature for Sanskrit to enable this example. That would depend on software to pass in the OT language tag appropriately. Or, fonts, could simply optimize for Sanskrit by default. Cheers, Andrew -----Original Message----- From: Unicode On Behalf Of Richard Wordingham via Unicode Sent: 18 February 2021 16:16 To: unicode at unicode.org Subject: Re: [EXTERNAL] Subscript Manual WA (was: Zawgyi Tonemarks in Latin Script) On Thu, 18 Feb 2021 19:36:37 +0000 Andrew Glass via Unicode wrote: The lack isn't where I thought it was - it turns out that the shaper specification already supports the non-medial subscript WA! I tweaked the OpenType lookup in Padauk to to generate the ?lyph for to check where the problem lay, but didn't realise that the HarfBuzz test program hb-view would by default use the Graphite shaping! When I selected the OpenType renderin?, I got the correct rendering from the tweaked font. The problem is that *fonts* seem not to be including the subscript WA, because it isn't required for *Modern Burmese*. It so happens that the major fonts' rendering of MEDIAL WA is suitable for - the pain of overlapping glyph ranges! > Great question Richard, can you provide some examples? Do we have an > agreed encoding mechanism for this? I'll give a detailed answer, though the renderers already have the solution. The need for a distinction was put forward by Michael Everson at al. in at least the following: L2/06-029 L2/06-077 p2 (a.k.a. WG2 N3043) L2/06-213 L2/06-077 p3 states, "Note that kwa with MEDIAL WA may take a teardrop or triangular WA shape, which is never the case with true subjoined WA (which is rare, though it occurs in Sanskrit)." Martin Hosken put forward other arguments, but I'm not sure that they were found convincing. As to examples, just look at the absolutives in https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.alamy.com%2Fstock-photo-burmese-writing-pali-canon-buddhist-canon-tripitaka-library-of-stone-21244784.html&data=04%7C01%7CAndrew.Glass%40microsoft.com%7Ccf0ec3894e04469b75a908d8d46ccf3e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637492911849319036%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=wChZZLe8tuIEjeWOibBFS2K8hAMrC5B1ZH1r2PAJcAU%3D&reserved=0 . I'd been goin? to say look for -itva? for both Pali and Sanskrit, but this forms seems commoner in word lists than actual text. TUS 13.0 Section 16.3 p647 says, "In Pali and Sanskrit texts written in the Myanmar script, as well as in older orthographies of Burmese, the consonants ya, ra, wa, and ha are sometimes rendered in subjoined form. In those cases, U+1039 ? myanmar sign virama and the regular form of the consonant are used." Thus, examples abound, and the encoding is defined. The codechart currently shows a teardrop shape for U+103D MYANMAR CONSONANT SIGN MEDIAL WA - that would not be suitable for . Richard. From richard.wordingham at ntlworld.com Sat Feb 20 05:13:20 2021 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sat, 20 Feb 2021 11:13:20 +0000 Subject: Subscript Manual WA In-Reply-To: References: <20210217051741.47de04dd@JRWUBU2> <20210217174335.719eaadf@JRWUBU2> <16222a2c-872e-a09a-49dc-0bca99fea32d@code2001.com> <20210217195241.14941fa7@JRWUBU2> <18c23f55-ca28-c3c1-2809-520faa604861@code2001.com> <20210218090448.59bc2324@JRWUBU2> <20210219001620.7f0dd007@JRWUBU2> Message-ID: <20210220111320.4fa1487b@JRWUBU2> On Fri, 19 Feb 2021 22:49:12 +0000 Andrew Glass via Unicode wrote: > Thank you for the nice examples, Richard. > Indeed this is up to fonts to enable. Fonts could add a locl feature > for Sanskrit to enable this example. That would depend on software to > pass in the OT language tag appropriately. Or, fonts, could simply > optimize for Sanskrit by default. The winning argument for this script has been that fonts should be able to produce something that is not outrageously wrong even in the absence of language information. So, the logic should rather be to disable the character only for languages that don't have it. If I've interpreted the reports correctly, it may turn up in Old Burmese, but won't turn up in Modern Burmese. So the logic would be to disable the subscripting of WA if the text were tagged as being in Modern Burmese, as opposed to Old Burmese, Pali (TBC) or Sanskrit. I don't know how Pali in the Shan variant of the Myanmar script should currently map to OpenType language tags - there might not even be a BCP 47 tag for it. I think there are similar questions for other other local variants of Pali in the script. There is research to be done into the spelling of the Pali and Sanskrit clusters with WA as a second element. I would not be surprised to find different spellings word/phrase initially and finally. (I've only seen Pali 'kv' as the result of sandhi, e.g. of 'ko attho' to 'kvattho'. It still needs its own artwork for Pali in the Sinhala script!) Richard. From jameskass at code2001.com Sat Feb 27 00:11:43 2021 From: jameskass at code2001.com (James Kass) Date: Sat, 27 Feb 2021 06:11:43 +0000 Subject: Unicode 14.0 Alpha Review In-Reply-To: References: Message-ID: <79bb4f68-c841-5cd1-129b-0d2a2489d581@code2001.com> https://www.unicode.org/charts/PDF/Unicode-14.0/U140-2A700.pdf Is the Unicode 14.0 provisional CJK character slated for U+2B736 a duplicate of existing character U+3B3F ? ? Note that Chinese radical # 130 (?) often takes the shape of Chinese radical # 74 (?).