From piotrunio-2004 at wp.pl Sun Jul 6 00:13:26 2025 From: piotrunio-2004 at wp.pl (=?UTF-8?Q?piotrunio-2004=40wp=2Epl?=) Date: Sun, 06 Jul 2025 07:13:26 +0200 Subject: =?UTF-8?Q?Odp=3A_Re=3A_RE=3A_Why_was_L2/25-061_provisionally_assigned=3F?= In-Reply-To: <<7fdb45802c9a44088c13bebfa3e5d9d9@grupawp.pl>> References: <7fdb45802c9a44088c13bebfa3e5d9d9@grupawp.pl> Message-ID: <1ffd84be591d4362bf1203e4bf3818e8@grupawp.pl> In? www.unicode.org UTC 183 Minutes (unicode.org) , in "D.1 1.7 Compound IPA stress mark" it was mentioned that " not everything that graphically consists of pieces should be encoded as a combining sequence ", which is a reasonable argument (even though the damage is already done with many combining sequences in Unicode that I would consider inappropriate), and I no longer question the provisional assignment. What I still don't get is the claim that " combining marks on modifier letters generally become smaller and attach to the modifier letter ". This implies not only that the 'superscript' or 'subscript' property is inherited by all diacritical marks above or below, but also that each use of combining implicitly creates a new nesting level relative to the level of the base character instead of relative to the base document. This seems to be bizarre to me and I've never seen a precedent of that occuring. Dnia 18 kwietnia 2025 22:08 piotrunio-2004 at wp.pl via Unicode <unicode at corp.unicode.org> napisa?(a): I don't use Unicode normalization myself and I'm not really in a position to make decisions on canonical/normalized representations, but since stability policies prevent one of the representations from normalizing to the other, if the usage of combining characters were standardized to compose to??? or ?? for phonetic usage, most likely one of the representations would be recommended whereas the other would go to the 'Do Not Emit' list or something. The use of anchor points is quite font technology specific and therefore off topic, though the point still stands with any method of maintaining systematic typographical alignment of all combining character combinations. Dnia 18 kwietnia 2025 21:38 Charlotte Eiffel Lilith Buff via Unicode <unicode at corp.unicode.org> napisa?(a): Which would be the canonical representation, spacing low line + combining line above or spacing high line?+ combining line below? Any font that bothered to define proper anchor points for diacritics on modifier symbols would display both sequences identically. Am Do., 17. Apr. 2025 um 21:41?Uhr schrieb piotrunio-2004 at wp.pl via Unicode < unicode at corp.unicode.org >: The way I see it is that U+02C8 and U+02CC are spacing versions of U+030D and U+0329 diacritics, and therefore to compose a spacing character with both diacritics, the spacing character of one and combining character of the other could be used. And there is already precedent of spacing diacritics composed with combining characters, particularly U+0385 which is composed as U+00A8 U+0301 (although the precomposed version is encoded as it's essential for CP869, CP1253, and ISO 8859-7 compatibility). Dnia 17 kwietnia 2025 21:05 Doug Ewell via Unicode < unicode at corp.unicode.org > napisa?(a): piotrunio-2004 at wp.pl wrote: I really don't get why [the character proposed in] L2/25-061 would be provisionally assigned to U+208F when it can be composed with combining characters (?? U+02C8 U+0329) or (?? U+02CC U+030D) which should be equivalent to the proposed character, and the potential use of the existing combining characters is not mentioned in the proposal, but the proposal owner was informed of the compositions before the Recommendations to UTC #183 were made. While the quoted passage on the Submitting Character Proposals page makes sense for ?normal letter with diacritic? proposals, which were once commonplace, I don?t think it?s typical to attach combining marks to a modifier letter such as U+02C8 or U+02CC, or for UTC to recommend composition in such cases. The NormalizationTest file does not include any instances of combining characters used with modifier letters, except for a few wacky, cross-script, stress-test cases involving a combination of Latin letters, Hebrew accents, and Adlam modifiers. Perhaps someone has authoritative info on whether the difference in handling is policy or just the way it?s been. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org ewellic.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From everson at evertype.com Mon Jul 14 11:56:59 2025 From: everson at evertype.com (Michael Everson) Date: Mon, 14 Jul 2025 17:56:59 +0100 Subject: =?utf-8?B?QW5kcmV3IEMuIFdlc3Qg6a2P5a6JIDE5NjDigJMyMDI1?= Message-ID: <96171C32-BF12-4A10-A371-C4BE26525708@evertype.com> With heavy heart it is my duty to inform my friends and colleagues that our great comrade Andrew West, Sinologist, Tangutologist, Khitanologist, Jurchenologist, Anglicist, and best of friends died suddenly though peacefully on 10 July 2025. Andrew and I worked together on many encoding projects, including on some characters which have yet to be accepted for encoding. We have lost a giant amongst script encoders. In 2016 Evertype published Andrew and Imre Galambos?s facsimile edition of Gerard Clauson?s Skeleton Tangut (Hsi Hsia) Dictionary. Andrew was preparing several more volumes about Tangut language and literature, and an edition of the classic Chinese novel ?Water Margin? (???). I have spoken with his widow Wei-Wei, and learned that, fortunately, his passwords are known and so it may be possible to work with some of his other colleagues to finish some of the work he had in hand and publish it in due course. Vayadhamm? sa?kh?r? ???? All things are impermanent Michael Everson From mheijdra at princeton.edu Mon Jul 14 12:00:42 2025 From: mheijdra at princeton.edu (Martin Heijdra) Date: Mon, 14 Jul 2025 17:00:42 +0000 Subject: =?utf-8?B?UkU6IEFuZHJldyBDLiBXZXN0IOmtj+WuiSAxOTYw4oCTMjAyNQ==?= In-Reply-To: <96171C32-BF12-4A10-A371-C4BE26525708@evertype.com> References: <96171C32-BF12-4A10-A371-C4BE26525708@evertype.com> Message-ID: Thank you, Michael, for this terrible news. He had been posting message until a few years ago. I knew him since he was a graduate student here; it took a while before he found his niche in which he became such an important figure. Martin -----Original Message----- From: Unicode On Behalf Of Michael Everson via Unicode Sent: Monday, July 14, 2025 12:57 PM To: Unicode@ Subject: Andrew C. West ?? 1960?2025 With heavy heart it is my duty to inform my friends and colleagues that our great comrade Andrew West, Sinologist, Tangutologist, Khitanologist, Jurchenologist, Anglicist, and best of friends died suddenly though peacefully on 10 July 2025. Andrew and I worked together on many encoding projects, including on some characters which have yet to be accepted for encoding. We have lost a giant amongst script encoders. In 2016 Evertype published Andrew and Imre Galambos?s facsimile edition of Gerard Clauson?s Skeleton Tangut (Hsi Hsia) Dictionary. Andrew was preparing several more volumes about Tangut language and literature, and an edition of the classic Chinese novel ?Water Margin? (???). I have spoken with his widow Wei-Wei, and learned that, fortunately, his passwords are known and so it may be possible to work with some of his other colleagues to finish some of the work he had in hand and publish it in due course. Vayadhamm? sa?kh?r? ???? All things are impermanent Michael Everson From tuvalkin at gmail.com Mon Jul 14 15:39:41 2025 From: tuvalkin at gmail.com (=?UTF-8?Q?Ant=C3=B3nio_MARTINS-Tuv=C3=A1lkin?=) Date: Mon, 14 Jul 2025 21:39:41 +0100 Subject: =?UTF-8?B?UmU6IEFuZHJldyBDLiBXZXN0IOmtj+WuiSAxOTYw4oCTMjAyNQ==?= In-Reply-To: <96171C32-BF12-4A10-A371-C4BE26525708@evertype.com> References: <96171C32-BF12-4A10-A371-C4BE26525708@evertype.com> Message-ID: <8531fb79-d416-4a92-8225-386634f192f2@gmail.com> Terrible news, Michael, and thank you for relaying them for us. We can take solace for the peacefulness and the available passwords, but is BabelStone safe online? I am a daily user of both BabelPad and BabelMap and have countless times thanked and praised Andrew in my inner voice. Here?s to his lasting legagy. -- Ant?nio MARTINS-Tuv?lkin ____. | ()| N?o me invejo de quem tem |####| PT-2695-010 Bobadela LRS carros, parelhas e montes | +351 934 821 700 s? me invejo de quem bebe | +351 212 463 477 a ?gua em todas as fontes | --------------------------------------------------------------------- De sable uma fonte e bordadura escaqueada de jalde e goles por timbre bandeira por mote o 1? verso acima e por grito de guerra "Mi rajtas!" --------------------------------------------------------------------- From everson at evertype.com Mon Jul 14 16:28:20 2025 From: everson at evertype.com (Michael Everson) Date: Mon, 14 Jul 2025 22:28:20 +0100 Subject: =?utf-8?B?UmU6IEFuZHJldyBDLiBXZXN0IOmtj+WuiSAxOTYw4oCTMjAyNQ==?= In-Reply-To: <8531fb79-d416-4a92-8225-386634f192f2@gmail.com> References: <96171C32-BF12-4A10-A371-C4BE26525708@evertype.com> <8531fb79-d416-4a92-8225-386634f192f2@gmail.com> Message-ID: <9FD98201-E5F8-4EBB-9239-7C2129F6E8F9@evertype.com> On 14 Jul 2025, at 21:39, Ant?nio MARTINS-Tuv?lkin via Unicode wrote: > Terrible news, Michael, and thank you for relaying them for us. We can take solace for the peacefulness and the available passwords, but is BabelStone safe online? I am a daily user of both BabelPad and BabelMap and have countless times thanked and praised Andrew in my inner voice. Here?s to his lasting legagy. Babelstone is safe online, and it is intended to look after everything there, whether it remains there or migrates elsewhere. For my part I am in the US till the end of August and will not be able to visit Farnham until September. Michael From as at signographie.de Mon Jul 14 17:07:04 2025 From: as at signographie.de (=?UTF-8?Q?A=2E_St=C3=B6tzner?=) Date: Tue, 15 Jul 2025 00:07:04 +0200 (CEST) Subject: =?UTF-8?Q?Re:_Andrew_C._West_=E9=AD=8F=E5=AE=89_1960=E2=80=932025?= In-Reply-To: <96171C32-BF12-4A10-A371-C4BE26525708@evertype.com> References: <96171C32-BF12-4A10-A371-C4BE26525708@evertype.com> Message-ID: <2112828091.213214.1752530824254@email.ionos.de> This is most sad news. I remember Andrew West?s sincere commitment to encoding efforts, his calm yet persuasive way of reasoning and his true diligence, with admiration and gratefulness. Only two weeks ago he was present at the WG2 Niigata meeting (via screen). I will miss his voice. My thoughts are with his widow and his family. Rest in Peace, Andrew. A. St?tzner > Michael Everson via Unicode hat am 14.07.2025 18:56 CEST geschrieben: > > > With heavy heart it is my duty to inform my friends and colleagues that our great comrade Andrew West, Sinologist, Tangutologist, Khitanologist, Jurchenologist, Anglicist, and best of friends died suddenly though peacefully on 10 July 2025. > > Andrew and I worked together on many encoding projects, including on some characters which have yet to be accepted for encoding. We have lost a giant amongst script encoders. > > In 2016 Evertype published Andrew and Imre Galambos?s facsimile edition of Gerard Clauson?s Skeleton Tangut (Hsi Hsia) Dictionary. Andrew was preparing several more volumes about Tangut language and literature, and an edition of the classic Chinese novel ?Water Margin? (???). > > I have spoken with his widow Wei-Wei, and learned that, fortunately, his passwords are known and so it may be possible to work with some of his other colleagues to finish some of the work he had in hand and publish it in due course. > > Vayadhamm? sa?kh?r? > ???? > All things are impermanent > > Michael Everson > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Mon Jul 14 20:07:15 2025 From: doug at ewellic.org (Doug Ewell) Date: Tue, 15 Jul 2025 01:07:15 +0000 Subject: =?utf-8?B?UkU6IEFuZHJldyBDLiBXZXN0IOmtj+WuiSAxOTYw4oCTMjAyNQ==?= In-Reply-To: <8531fb79-d416-4a92-8225-386634f192f2@gmail.com> References: <96171C32-BF12-4A10-A371-C4BE26525708@evertype.com> <8531fb79-d416-4a92-8225-386634f192f2@gmail.com> Message-ID: Ant?nio MARTINS-Tuv?lkin replied to Michael Everson: > I am a daily user of both BabelPad and BabelMap and have countless > times thanked and praised Andrew in my inner voice. Haven?t we all. After many years of constant delight in these two applications of Andrew?s, and having been honored to contribute my SCSU encoding and decoding code to BabelPad, I finally sent Andrew some wholly inadequate sum a few years back as a contribution for his work. I wish I had been able to send much more. I never met Andrew in person, but we communicated online frequently regarding his apps and fonts, and a few other topics such as the Middle Mongolian calendar. He was always supremely helpful and appreciative of any suggestions thrown his way. I can only imagine the loss felt by those who really knew him, let alone his family. I worked on C++ and MFC ?yonks ago? (borrowed from Stephen Farrell on the IETF list), and I have offered to at least take a look at the Babel* source code to see if I can contribute to its ongoing maintenance. (In the meantime, Unicode 17.0 beta versions exist at https://www.babelstone.co.uk/Software/Beta.html .) Andrew never got a Bulldog Award, but I would expect to see him here shortly: https://www.unicode.org/consortium/memoriam.html ??????? -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org From jameskass at code2001.com Mon Jul 14 22:23:22 2025 From: jameskass at code2001.com (James Kass) Date: Tue, 15 Jul 2025 03:23:22 +0000 Subject: =?UTF-8?B?UmU6IEFuZHJldyBDLiBXZXN0IOmtj+WuiSAxOTYw4oCTMjAyNQ==?= In-Reply-To: References: <96171C32-BF12-4A10-A371-C4BE26525708@evertype.com> <8531fb79-d416-4a92-8225-386634f192f2@gmail.com> Message-ID: Michael's posts have made me smile, chuckle, and even laugh from time to time over the years.? But today's post brought profound sorrow. Andrew was always there for me.? With pointers, knowledge, and encouragement.? Nobody can take his place.? He is one of the giants in our field and future developers will hopefully keep advancing his projects and work. From pandey at umich.edu Mon Jul 14 22:35:12 2025 From: pandey at umich.edu (Anshuman Pandey) Date: Mon, 14 Jul 2025 22:35:12 -0500 Subject: =?utf-8?Q?Re:_Andrew_C._West_=E9=AD=8F=E5=AE=89_1960=E2=80=93202?= =?utf-8?Q?5?= In-Reply-To: <96171C32-BF12-4A10-A371-C4BE26525708@evertype.com> References: <96171C32-BF12-4A10-A371-C4BE26525708@evertype.com> Message-ID: This is sad news and a loss for the Unicode community. Andrew W taught me a lot about Indic adjacent scripts, and generously shared with me resources on lesser known writing systems, including documents he had personally gathered. Let us honor his imprint on Unicode and our community by continuing our mission, to which Andrew was so dedicated ??? With reverence, Anshu > On Jul 14, 2025, at 11:57?AM, Michael Everson via Unicode wrote: > > ?With heavy heart it is my duty to inform my friends and colleagues that our great comrade Andrew West, Sinologist, Tangutologist, Khitanologist, Jurchenologist, Anglicist, and best of friends died suddenly though peacefully on 10 July 2025. > > Andrew and I worked together on many encoding projects, including on some characters which have yet to be accepted for encoding. We have lost a giant amongst script encoders. > > In 2016 Evertype published Andrew and Imre Galambos?s facsimile edition of Gerard Clauson?s Skeleton Tangut (Hsi Hsia) Dictionary. Andrew was preparing several more volumes about Tangut language and literature, and an edition of the classic Chinese novel ?Water Margin? (???). > > I have spoken with his widow Wei-Wei, and learned that, fortunately, his passwords are known and so it may be possible to work with some of his other colleagues to finish some of the work he had in hand and publish it in due course. > > Vayadhamm? sa?kh?r? > ???? > All things are impermanent > > Michael Everson From pgcon6 at msn.com Tue Jul 15 15:48:59 2025 From: pgcon6 at msn.com (Peter Constable) Date: Tue, 15 Jul 2025 20:48:59 +0000 Subject: =?big5?B?UmU6IEFuZHJldyBDLiBXZXN0IMNRpncgMTk2MKFWMjAyNQ==?= In-Reply-To: <96171C32-BF12-4A10-A371-C4BE26525708@evertype.com> References: <96171C32-BF12-4A10-A371-C4BE26525708@evertype.com> Message-ID: Andrew was an important participant in the Unicode, WG2 and IRG communities for over 20 years. One of his earliest major contributions was to the successful encoding of Phags-pa script, with key details negotiated at the 2005 WG2 meeting in Xiaman. And one of his last contributions was to provide valuable input on feedback submitted regarding Phags-pa. Of course, in between, he had so many impactful contributions on encoding of East Asian scripts. We all will pass from this life; some of us will be fortunate to have the kind of enduring impact he had. Peter Constable -------------- next part -------------- An HTML attachment was scrubbed... URL: From ivanpan3 at gmail.com Sat Jul 19 05:54:12 2025 From: ivanpan3 at gmail.com (Ivan Panchenko) Date: Sat, 19 Jul 2025 12:54:12 +0200 Subject: Double right arrowhead? Message-ID: People sometimes use two greater-than signs or a right-hand guillemet to point to the right, e.g., in ?Continue reading >>? or preceding a text as part of a link. I wonder which Unicode character(s) would be appropriate for this. There is U+02C3 (???, modifier letter right arrowhead), but it is a modifier letter (see also U+08FC (arabic double right arrowhead above with dot) for comparison). Simply U+27A4 (???, black rightwards arrowhead), whose contour consists of two >-like shapes? Or should a new character be proposed? From harjitmoe at outlook.com Sat Jul 19 06:47:23 2025 From: harjitmoe at outlook.com (Harriet Riddle) Date: Sat, 19 Jul 2025 12:47:23 +0100 Subject: Double right arrowhead? In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: From cloos at jhcloos.com Sat Jul 19 08:52:22 2025 From: cloos at jhcloos.com (James Cloos) Date: Sat, 19 Jul 2025 09:52:22 -0400 Subject: Double right arrowhead? In-Reply-To: References: Message-ID: Here are some others to consider using: 2192 [?] RIGHTWARDS ARROW 219B [?] RIGHTWARDS ARROW WITH STROKE 219D [?] RIGHTWARDS WAVE ARROW 21A0 [?] RIGHTWARDS TWO HEADED ARROW 21A3 [?] RIGHTWARDS ARROW WITH TAIL 21A6 [?] RIGHTWARDS ARROW FROM BAR 21AA [?] RIGHTWARDS ARROW WITH HOOK 21AC [?] RIGHTWARDS ARROW WITH LOOP 21C9 [?] RIGHTWARDS PAIRED ARROWS 21CF [?] RIGHTWARDS DOUBLE ARROW WITH STROKE 21D2 [?] RIGHTWARDS DOUBLE ARROW 21DB [?] RIGHTWARDS TRIPLE ARROW 21DD [?] RIGHTWARDS SQUIGGLE ARROW 21E2 [?] RIGHTWARDS DASHED ARROW 21E5 [?] RIGHTWARDS ARROW TO BAR 21E8 [?] RIGHTWARDS WHITE ARROW 21F0 [?] RIGHTWARDS WHITE ARROW FROM WALL 21F4 [?] RIGHT ARROW WITH SMALL CIRCLE 21F6 [?] THREE RIGHTWARDS ARROWS 21F8 [?] RIGHTWARDS ARROW WITH VERTICAL STROKE 21FB [?] RIGHTWARDS ARROW WITH DOUBLE VERTICAL STROKE 21FE [?] RIGHTWARDS OPEN-HEADED ARROW 2794 [?] HEAVY WIDE-HEADED RIGHTWARDS ARROW 2799 [?] HEAVY RIGHTWARDS ARROW 279B [?] DRAFTING POINT RIGHTWARDS ARROW 279C [?] HEAVY ROUND-TIPPED RIGHTWARDS ARROW 279D [?] TRIANGLE-HEADED RIGHTWARDS ARROW 279E [?] HEAVY TRIANGLE-HEADED RIGHTWARDS ARROW 279F [?] DASHED TRIANGLE-HEADED RIGHTWARDS ARROW 27A0 [?] HEAVY DASHED TRIANGLE-HEADED RIGHTWARDS ARROW 27A1 [?] BLACK RIGHTWARDS ARROW 27A2 [?] THREE-D TOP-LIGHTED RIGHTWARDS ARROWHEAD 27A3 [?] THREE-D BOTTOM-LIGHTED RIGHTWARDS ARROWHEAD 27A4 [?] BLACK RIGHTWARDS ARROWHEAD 27A5 [?] HEAVY BLACK CURVED DOWNWARDS AND RIGHTWARDS ARROW 27A6 [?] HEAVY BLACK CURVED UPWARDS AND RIGHTWARDS ARROW 27A7 [?] SQUAT BLACK RIGHTWARDS ARROW 27A8 [?] HEAVY CONCAVE-POINTED BLACK RIGHTWARDS ARROW 27A9 [?] RIGHT-SHADED WHITE RIGHTWARDS ARROW 27AB [?] BACK-TILTED SHADOWED WHITE RIGHTWARDS ARROW 27AC [?] FRONT-TILTED SHADOWED WHITE RIGHTWARDS ARROW 27AD [?] HEAVY LOWER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW 27AE [?] HEAVY UPPER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW 27AF [?] NOTCHED LOWER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW 27B1 [?] NOTCHED UPPER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW 27B2 [?] CIRCLED HEAVY WHITE RIGHTWARDS ARROW 27B3 [?] WHITE-FEATHERED RIGHTWARDS ARROW 27B5 [?] BLACK-FEATHERED RIGHTWARDS ARROW 27B8 [?] HEAVY BLACK-FEATHERED RIGHTWARDS ARROW 27BA [?] TEARDROP-BARBED RIGHTWARDS ARROW 27BB [?] HEAVY TEARDROP-SHANKED RIGHTWARDS ARROW 27BC [?] WEDGE-TAILED RIGHTWARDS ARROW 27BD [?] HEAVY WEDGE-TAILED RIGHTWARDS ARROW 27BE [?] OPEN-OUTLINED RIGHTWARDS ARROW 27F4 [?] RIGHT ARROW WITH CIRCLED PLUS 27F6 [?] LONG RIGHTWARDS ARROW 27F9 [?] LONG RIGHTWARDS DOUBLE ARROW 27FC [?] LONG RIGHTWARDS ARROW FROM BAR 27FE [?] LONG RIGHTWARDS DOUBLE ARROW FROM BAR 27FF [?] LONG RIGHTWARDS SQUIGGLE ARROW 2900 [?] RIGHTWARDS TWO-HEADED ARROW WITH VERTICAL STROKE 2901 [?] RIGHTWARDS TWO-HEADED ARROW WITH DOUBLE VERTICAL STROKE 2903 [?] RIGHTWARDS DOUBLE ARROW WITH VERTICAL STROKE 2905 [?] RIGHTWARDS TWO-HEADED ARROW FROM BAR 2907 [?] RIGHTWARDS DOUBLE ARROW FROM BAR 290D [?] RIGHTWARDS DOUBLE DASH ARROW 290F [?] RIGHTWARDS TRIPLE DASH ARROW 2910 [?] RIGHTWARDS TWO-HEADED TRIPLE DASH ARROW 2911 [?] RIGHTWARDS ARROW WITH DOTTED STEM 2914 [?] RIGHTWARDS ARROW WITH TAIL WITH VERTICAL STROKE 2915 [?] RIGHTWARDS ARROW WITH TAIL WITH DOUBLE VERTICAL STROKE 2916 [?] RIGHTWARDS TWO-HEADED ARROW WITH TAIL 2917 [?] RIGHTWARDS TWO-HEADED ARROW WITH TAIL WITH VERTICAL STROKE 2918 [?] RIGHTWARDS TWO-HEADED ARROW WITH TAIL WITH DOUBLE VERTICAL STROKE 291A [?] RIGHTWARDS ARROW-TAIL 291C [?] RIGHTWARDS DOUBLE ARROW-TAIL 291E [?] RIGHTWARDS ARROW TO BLACK DIAMOND 2920 [?] RIGHTWARDS ARROW FROM BAR TO BLACK DIAMOND 2933 [?] WAVE ARROW POINTING DIRECTLY RIGHT 2934 [?] ARROW POINTING RIGHTWARDS THEN CURVING UPWARDS 2935 [?] ARROW POINTING RIGHTWARDS THEN CURVING DOWNWARDS 2937 [?] ARROW POINTING DOWNWARDS THEN CURVING RIGHTWARDS 2947 [?] RIGHTWARDS ARROW THROUGH X 2970 [?] RIGHT DOUBLE ARROW WITH ROUNDED HEAD 2971 [?] EQUALS SIGN ABOVE RIGHTWARDS ARROW 2972 [?] TILDE OPERATOR ABOVE RIGHTWARDS ARROW 2974 [?] RIGHTWARDS ARROW ABOVE TILDE OPERATOR 2975 [?] RIGHTWARDS ARROW ABOVE ALMOST EQUAL TO 2B43 [?] RIGHTWARDS ARROW THROUGH GREATER-THAN 2B44 [?] RIGHTWARDS ARROW THROUGH SUPERSET 2B46 [?] RIGHTWARDS QUADRUPLE ARROW 2B47 [?] REVERSE TILDE OPERATOR ABOVE RIGHTWARDS ARROW 2B48 [?] RIGHTWARDS ARROW ABOVE REVERSE ALMOST EQUAL TO 2B4C [?] RIGHTWARDS ARROW ABOVE REVERSE TILDE OPERATOR 2B62 [?] RIGHTWARDS TRIANGLE-HEADED ARROW 2B6C [?] RIGHTWARDS TRIANGLE-HEADED DASHED ARROW 2B72 [?] RIGHTWARDS TRIANGLE-HEADED ARROW TO BAR 2B7C [?] RIGHTWARDS TRIANGLE-HEADED ARROW WITH DOUBLE HORIZONTAL STROKE 2B86 [?] RIGHTWARDS TRIANGLE-HEADED PAIRED ARROWS 2B8A [?] RIGHTWARDS BLACK CIRCLED WHITE ARROW 2B95 [?] RIGHTWARDS BLACK ARROW 2B9A [?] THREE-D TOP-LIGHTED RIGHTWARDS EQUILATERAL ARROWHEAD 2B9E [?] BLACK RIGHTWARDS EQUILATERAL ARROWHEAD 2BA9 [?] BLACK CURVED DOWNWARDS AND RIGHTWARDS ARROW 2BAB [?] BLACK CURVED UPWARDS AND RIGHTWARDS ARROW 2BEE [?] RIGHTWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS FFEB [?] HALFWIDTH RIGHTWARDS ARROW 1F51C [?] SOON WITH RIGHTWARDS ARROW ABOVE 1F802 [?] RIGHTWARDS ARROW WITH SMALL TRIANGLE ARROWHEAD 1F806 [?] RIGHTWARDS ARROW WITH MEDIUM TRIANGLE ARROWHEAD 1F80A [?] RIGHTWARDS ARROW WITH LARGE TRIANGLE ARROWHEAD 1F812 [?] RIGHTWARDS ARROW WITH SMALL EQUILATERAL ARROWHEAD 1F816 [?] RIGHTWARDS ARROW WITH EQUILATERAL ARROWHEAD 1F81A [?] HEAVY RIGHTWARDS ARROW WITH EQUILATERAL ARROWHEAD 1F81E [?] HEAVY RIGHTWARDS ARROW WITH LARGE EQUILATERAL ARROWHEAD 1F822 [?] RIGHTWARDS TRIANGLE-HEADED ARROW WITH NARROW SHAFT 1F826 [?] RIGHTWARDS TRIANGLE-HEADED ARROW WITH MEDIUM SHAFT 1F82A [?] RIGHTWARDS TRIANGLE-HEADED ARROW WITH BOLD SHAFT 1F82E [?] RIGHTWARDS TRIANGLE-HEADED ARROW WITH HEAVY SHAFT 1F832 [?] RIGHTWARDS TRIANGLE-HEADED ARROW WITH VERY HEAVY SHAFT 1F836 [?] RIGHTWARDS FINGER-POST ARROW 1F83A [?] RIGHTWARDS SQUARED ARROW 1F83E [?] RIGHTWARDS COMPRESSED ARROW 1F842 [?] RIGHTWARDS HEAVY COMPRESSED ARROW 1F846 [?] RIGHTWARDS HEAVY ARROW 1F852 [?] RIGHTWARDS SANS-SERIF ARROW 1F862 [?] WIDE-HEADED RIGHTWARDS LIGHT BARB ARROW 1F86A [?] WIDE-HEADED RIGHTWARDS BARB ARROW 1F872 [?] WIDE-HEADED RIGHTWARDS MEDIUM BARB ARROW 1F87A [?] WIDE-HEADED RIGHTWARDS HEAVY BARB ARROW 1F882 [?] WIDE-HEADED RIGHTWARDS VERY HEAVY BARB ARROW 1F892 [?] RIGHTWARDS TRIANGLE ARROWHEAD 1F896 [?] RIGHTWARDS WHITE ARROW WITHIN TRIANGLE ARROWHEAD 1F89A [?] RIGHTWARDS ARROW WITH NOTCHED TAIL 1F8A1 [?] RIGHTWARDS BOTTOM SHADED WHITE ARROW 1F8A3 [?] RIGHTWARDS TOP SHADED WHITE ARROW 1F8A5 [?] RIGHTWARDS RIGHT-SHADED WHITE ARROW 1F8A9 [?] RIGHTWARDS BACK-TILTED SHADOWED WHITE ARROW 1F8AB [?] RIGHTWARDS FRONT-TILTED SHADOWED WHITE ARROW -JimC -- James Cloos OpenPGP: https://jhcloos.com/0x997A9F17ED7DAEA6.asc From tameiraomatthew968 at gmail.com Sat Jul 19 12:50:16 2025 From: tameiraomatthew968 at gmail.com (Matthew Tameirao) Date: Sat, 19 Jul 2025 10:50:16 -0700 Subject: New Unicode Versions Update for Pipeline Message-ID: As we know that the new versions in Unicode are coming each September each year, So I've planned to add Unicode 17.1 for 2026. Here are the versions so far. Unicode 17.0 Unicode 17.0 will be released on September 9, 2025. - Sidetic (U+10940-U+1095F) - Sharada Supplement (U+11B60-U+11B7F) - Tolong Siki (U+11DB0-U+11DEF) - Chisoi (U+16D80-U+16DAF) - Beria Erfe (U+16EA0-U+16EDF) - Tangut Components Supplement (U+18D80-U+18DFF) - Miscellaneous Symbols Supplement (U+1CEC0-U+1CEFF) - Tai Yo (U+1E6C0-U+1E6FF) - CJK Unified Ideographs Extension J (U+323B0-U+3347F) Unicode 17.1 Unicode 17.1 will be released on September 2026. - Small Seal (U+38000-U+3AB9F) Code Points Provisionally Assigned These are codepoints provisionally assigned by Unicode, but for a future update in the Unicode Standard. Asterisks indicate that it's not yet ready but will be. (According to SEI Report) - Zoulai (U+11750-U+117AF)* - Archaic Cuneiform Numerals (U+12550-U+1268F) - Proto-Cuneiform (U+12690-U+12ECF)* - Kulitan (U+16DD0-U+16DFF)* - Jurchen (U+18E00-U+1919F) - Jurchen Radicals (U+191A0-U+191DF) - Musical Symbols Supplement (U+1D250-U+1D28F) - Lampung (U+1E700-U+1E73F)* The New Block Small Seal will be added to Unicode version 17.1 due to it's large amount of characters in this block. -------------- next part -------------- An HTML attachment was scrubbed... URL: From roozbeh at corp.unicode.org Sat Jul 19 15:07:55 2025 From: roozbeh at corp.unicode.org (Roozbeh Pournader) Date: Sat, 19 Jul 2025 13:07:55 -0700 Subject: New Unicode Versions Update for Pipeline In-Reply-To: References: Message-ID: For anybody reading this thread, I just wanted to remind you that there are no such plans for releasing Unicode 17.1. Current plans are releasing Unicode 17.0 in September 2025 and Unicode 18.0 in September 2026. For the latest plans about which character will be in which release, check the Unicode Pipeline at: https://www.unicode.org/alloc/Pipeline.html (Note that the above list is subject to change at each UTC meeting.) Roozbeh On Sat, Jul 19, 2025, 10:53?AM Matthew Tameirao via Unicode < unicode at corp.unicode.org> wrote: > As we know that the new versions in Unicode are coming each September each > year, So I've planned to add Unicode 17.1 for 2026. Here are the versions > so far. > > Unicode 17.0 > Unicode 17.0 will be released on September 9, 2025. > > - Sidetic (U+10940-U+1095F) > - Sharada Supplement (U+11B60-U+11B7F) > - Tolong Siki (U+11DB0-U+11DEF) > - Chisoi (U+16D80-U+16DAF) > - Beria Erfe (U+16EA0-U+16EDF) > - Tangut Components Supplement (U+18D80-U+18DFF) > - Miscellaneous Symbols Supplement (U+1CEC0-U+1CEFF) > - Tai Yo (U+1E6C0-U+1E6FF) > - CJK Unified Ideographs Extension J (U+323B0-U+3347F) > > Unicode 17.1 > Unicode 17.1 will be released on September 2026. > > - Small Seal (U+38000-U+3AB9F) > > Code Points Provisionally Assigned > These are codepoints provisionally assigned by Unicode, but for a future > update in the Unicode Standard. Asterisks indicate that it's not yet ready > but will be. (According to SEI Report) > > - Zoulai (U+11750-U+117AF)* > - Archaic Cuneiform Numerals (U+12550-U+1268F) > - Proto-Cuneiform (U+12690-U+12ECF)* > - Kulitan (U+16DD0-U+16DFF)* > - Jurchen (U+18E00-U+1919F) > - Jurchen Radicals (U+191A0-U+191DF) > - Musical Symbols Supplement (U+1D250-U+1D28F) > - Lampung (U+1E700-U+1E73F)* > > The New Block Small Seal will be added to Unicode version 17.1 due to it's > large amount of characters in this block. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ivanpan3 at gmail.com Sun Jul 20 01:39:38 2025 From: ivanpan3 at gmail.com (Ivan Panchenko) Date: Sun, 20 Jul 2025 08:39:38 +0200 Subject: Double right arrowhead? In-Reply-To: References: Message-ID: I do not like U+23E9, because it intensifies the symbol for ?play? (U+23F5) and is used for ?fast forward?. On the other hand, ??>? (not ?|?|>?) instead of ??? seems to simply clarify the shape. Maybe U+27A4 is what I am looking for, after all. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ivanpan3 at gmail.com Sun Jul 20 01:51:07 2025 From: ivanpan3 at gmail.com (Ivan Panchenko) Date: Sun, 20 Jul 2025 08:51:07 +0200 Subject: Double right arrowhead? In-Reply-To: References: Message-ID: (Should have been ?>? instead of ???; the latter is a modifier symbol that in some fonts is not shown in superscript.) > I do not like U+23E9, because it intensifies the symbol for ?play? (U+23F5) and is used for ?fast forward?. On the other hand, ??>? (not ?|?|>?) instead of ??? seems to simply clarify the shape. Maybe U+27A4 is what I am looking for, after all. From don.hosek at gmail.com Sun Jul 20 11:28:59 2025 From: don.hosek at gmail.com (Don Hosek) Date: Sun, 20 Jul 2025 12:28:59 -0400 Subject: New Unicode Versions Update for Pipeline Message-ID: How does the whole version numbering thing work deciding between major and minor in the annual update? The last time there was a minor version update, it included a major change to the logic for grapheme segmentation rather than the usual updates to the data tables. -dh -------------- next part -------------- An HTML attachment was scrubbed... URL: From unicode.org at sl.neatnit.net Sun Jul 20 11:43:22 2025 From: unicode.org at sl.neatnit.net (Nitai Sasson) Date: Sun, 20 Jul 2025 16:43:22 +0000 Subject: Double right arrowhead? In-Reply-To: References: Message-ID: <175302981091.7.6604359669503701964.816946605@sl.neatnit.net> Allow me to just point out the option of continuing to use ">>" for this purpose. There's nothing inherently wrong with it, although it might need extra care to prevent line-breaking between the two, in some software. (This isn't an issue in the couple of programs I just tested) On Saturday, 19 July 2025 at 13:56, Ivan Panchenko via Unicode wrote: > People sometimes use two greater-than signs or a right-hand guillemet > to point to the right, e.g., in ?Continue reading >>? or preceding a > text as part of a link. I wonder which Unicode character(s) would be > appropriate for this. There is U+02C3 (???, modifier letter right > arrowhead), but it is a modifier letter (see also U+08FC (arabic > double right arrowhead above with dot) for comparison). Simply U+27A4 > (???, black rightwards arrowhead), whose contour consists of two > > -like shapes? Or should a new character be proposed? From harjitmoe at outlook.com Sun Jul 20 13:30:14 2025 From: harjitmoe at outlook.com (Harriet Riddle) Date: Sun, 20 Jul 2025 19:30:14 +0100 Subject: Future of the BabelStone Han variation selectors Message-ID: An HTML attachment was scrubbed... URL: From jukkakk at gmail.com Sun Jul 20 13:51:22 2025 From: jukkakk at gmail.com (Jukka K. Korpela) Date: Sun, 20 Jul 2025 21:51:22 +0300 Subject: Double right arrowhead? In-Reply-To: <175302981091.7.6604359669503701964.816946605@sl.neatnit.net> References: <175302981091.7.6604359669503701964.816946605@sl.neatnit.net> Message-ID: Nitai Sasson via Unicode (unicode at corp.unicode.org) wrote: > Allow me to just point out the option of continuing to use ">>" for this > purpose > I agree. The use of two GREATER THAN characters is just a way to emulate a rightwards arrow using ASCII graphics. That?s OK when you are limited to ASCII, but when you can use Unicode, there are several arrow or arrowhead characters to choose from. There is no point in adding a Unicode character for the ?ASCII graphics? >>, Yucca, https://jkorpela.fi -------------- next part -------------- An HTML attachment was scrubbed... URL: From everson at evertype.com Sun Jul 20 14:57:57 2025 From: everson at evertype.com (Michael Everson) Date: Sun, 20 Jul 2025 14:57:57 -0500 Subject: Future of the BabelStone Han variation selectors In-Reply-To: References: Message-ID: <222122A1-41A9-4D85-BDAE-314AAB3A9D41@evertype.com> Please be patient. Michael Everson http://evertype.com > On 20 Jul 2025, at 13:32, Harriet Riddle via Unicode wrote: > > ? > The following is the most recent and, as it transpired, potentially final update on the status of the BabelStone Han variation selector sequences: > > > ?It is intended to register a BabelStone Collection in the IVD at a future date when the sequences listed here are stable. [?] NB I plan to reassign most of the IVSes marked with an asterisk so they have a unique IVS sequence. When the CAAPH collection is accepted for registration in the IVD I will also need to reassign many of the BabelStone Han IVS sequences which conflict with the collection.? > > ?https://www.babelstone.co.uk/Fonts/BSH_IVS_BETA.html > > Now, I'd always wondered when ?when the sequences listed here are stable? would be?but I was not expecting that it would be as soon as this year that BabelStone's untimely passing, not long before the CAAPH collection was registered, would sadly settle that question. > > Consequently, I do wonder about the future status of these variation selector sequences, given that the current versions of the BabelStone fonts are presumably either final or, at minimum, would effectively serve as a memorial watershed. In particular, those of the sequences implemented by the current version of the font which do not conflict with existing registered IVSes. > > ?Har. From pgcon6 at msn.com Mon Jul 21 14:00:59 2025 From: pgcon6 at msn.com (Peter Constable) Date: Mon, 21 Jul 2025 19:00:59 +0000 Subject: New Unicode Versions Update for Pipeline In-Reply-To: References: Message-ID: Normally the annual updates will be a major version update. Unicode 15.1 in 2023 was an unusual exception. Because of process changes that were being made, the initial plan for 2023 was a 15.1 release with limited scope of changes, including a very small number of new characters---see L2/22-270 for details. Later, it became necessary to add CJK Extention I to the scope, but the scope was still constrained in general. The changes to UAX #29 were carefully considered in relation to capacity before they were approved for 15.1. Peter Constable Get Outlook for Mac From: Unicode on behalf of Don Hosek via Unicode Date: Sunday, July 20, 2025 at 9:34?AM To: unicode at corp.unicode.org Subject: Re: New Unicode Versions Update for Pipeline How does the whole version numbering thing work deciding between major and minor in the annual update? The last time there was a minor version update, it included a major change to the logic for grapheme segmentation rather than the usual updates to the data tables. -dh -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.b.y.taoboyu at foxmail.com Wed Jul 23 16:47:12 2025 From: t.b.y.taoboyu at foxmail.com (t.b.y.taoboyu at foxmail.com) Date: Wed, 23 Jul 2025 21:47:12 +0000 Subject: U+1F1AD: SQUARED FN or MASK WORK SYMBOL? Message-ID: Unicode mail list, ????I am seeking the symbol on the Fn key and have find it encoded as U+1F1AD as specified in L2/17-072. ISO/IEC JTC 1/SC 2/WG 2 - Unicode Since then, ISO/IEC 9995-7 was developed further. The last version of the complete standard dates from 2009. Thereafter, an amendment was released in 2012 with several new symbols reflecting the need of multilingual keyboards in support of the cultural diversity. Such keyboards are easily accessible especially when not being confined to physical keyboards with fixed engravings. www.unicode.org However this code has been representing MASK WORK SYMBOL since Version 13.0. ^h=2 (y W??? G$M ? h N?P> ??? D ?r ? f?a,?I?.?D?AZ??? kY0? U?%???V d?g - Unicode ^h=2 (y W??? G$M ? h N?P> ??? D ?r ? f?a,?I?.?D?AZ??? kY0? U?%???V d?g www.unicode.org And U+1F1AE and U+1F1AF, representing SANS-SERIF CAPITAL U ENCLOSING ZERO-NINE and SANS-SERIF CAPITAL U ENCLOSING ZERO-F in the proposal, have not been assigned even in the latest release, Version 16.0. The Unicode Standard, Version 16.0 Range: 1F100?1F1FF This file contains an excerpt from the character code tables and list of character names for www.unicode.org Can anyone answer: * Which stage is that proposal in? * Why is there inconsistence between the proposal and the Unicode standards? ISO/IEC JTC 1/SC 35, I have noted ISO/IEC CD 9995-7 under development. [https://www.iso.org/modules/isoorg-template/img/iso/iso-logo-og.png] ISO/IEC AWI 9995-7 - Information technology ? Keyboard layouts for text and office systems ? Part 7: Symbols used to represent functions - ISO - International Organization for Standardization Within the general scope described in ISO/IEC 9995 1, this document specifies symbols for functions found on any type of numeric, alphanumeric or composite keyboards. Each of these symbols is intended to be considered as universal and non-language related equivalent of names for the function they represent. Names of functions and descriptions are given in English and French. www.iso.org Will the coming version resolve this problem? -------------- next part -------------- An HTML attachment was scrubbed... URL: From beckiergb at gmail.com Wed Jul 23 18:20:15 2025 From: beckiergb at gmail.com (Rebecca Bettencourt) Date: Wed, 23 Jul 2025 16:20:15 -0700 Subject: U+1F1AD: SQUARED FN or MASK WORK SYMBOL? In-Reply-To: References: Message-ID: A proposal has to be recommended by the Script Ad-Hoc (now Script Encoding Working Group) and reviewed by the UTC (Unicode Technical Committee) before it becomes part of the Standard. The Script Ad-Hoc's response to that particular proposal is found in L2/17-153 (section 16, page 11). It was not recommended for encoding; instead, they recommended their feedback to be sent to the author of the proposal. I don't know if that ever happened or if the authors ever addressed it or submitted a revised proposal. -- Rebecca Bettencourt On Wed, Jul 23, 2025 at 2:56?PM t.b.y.taoboyu--- via Unicode < unicode at corp.unicode.org> wrote: > Unicode mail list, > I am seeking the symbol on the Fn key and have find it encoded as U+1F1AD as > specified in L2/17-072 > . > ISO/IEC JTC 1/SC 2/WG 2 - Unicode > > Since then, ISO/IEC 9995-7 was developed further. The last version of the > complete standard dates from 2009. Thereafter, an amendment was released in > 2012 with several new symbols reflecting the need of multilingual keyboards > in support of the cultural diversity. Such keyboards are easily accessible > especially when not being confined to physical keyboards with fixed > engravings. > www.unicode.org > However this code has been representing MASK WORK SYMBOL since Version > 13.0 . > ^h=2 (y W??? G$M ? h N?P> ??? D ?r ? f?a,?I?.?D?AZ??? kY0? U?%???V d?g - > Unicode > ^h=2 (y W??? G$M ? h N?P> ??? D ?r ? f?a,?I?.?D?AZ??? kY0? U?%???V d?g > www.unicode.org > And U+1F1AE and U+1F1AF, representing SANS-SERIF CAPITAL U ENCLOSING > ZERO-NINE and SANS-SERIF CAPITAL U ENCLOSING ZERO-F in the proposal, have > not been assigned even in the latest release, Version 16.0 > . > The Unicode Standard, Version 16.0 > > Range: 1F100?1F1FF This file contains an excerpt from the character code > tables and list of character names for > www.unicode.org > Can anyone answer: > > - Which stage is that proposal in? > - Why is there inconsistence between the proposal and the Unicode > standards? > > > ISO/IEC JTC 1/SC 35, > I have noted ISO/IEC CD 9995-7 under > development. > > ISO/IEC AWI 9995-7 - Information technology ? Keyboard layouts for text > and office systems ? Part 7: Symbols used to represent functions - ISO - > International Organization for Standardization > > Within the general scope described in ISO/IEC 9995 1, this document > specifies symbols for functions found on any type of numeric, alphanumeric > or composite keyboards. Each of these symbols is intended to be considered > as universal and non-language related equivalent of names for the function > they represent. Names of functions and descriptions are given in English > and French. > www.iso.org > Will the coming version resolve this problem? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marius.spix at web.de Thu Jul 24 02:51:22 2025 From: marius.spix at web.de (Marius Spix) Date: Thu, 24 Jul 2025 07:51:22 +0000 Subject: Aw: Re: U+1F1AD: SQUARED FN or MASK WORK SYMBOL? In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From 4mm4adbfrm4 at orange.fr Thu Jul 24 10:19:02 2025 From: 4mm4adbfrm4 at orange.fr (Michel Mariani) Date: Thu, 24 Jul 2025 17:19:02 +0200 Subject: BabelStone Han variation sequences Message-ID: [Sorry for not posting to the more appropriate thread, but I had to resubscribe to the Unicode mailing list.] > Now, I'd always wondered when ?when the sequences listed here are stable? would be?but I was not expecting that it would be as soon as this year that BabelStone's untimely passing, not long before the CAAPH collection was registered, would sadly settle that question. I had just added support for the new CAAPH collection (IVD 2025) in the CJK Variations utility of my Unicopedia Sinica application, when I got that terribly sad and so shocking news of Andrew's passing away... The latest beta release of the BabelStone Han font is very recent (2025-07-08) and the associated BSH_IVS_BETA.html page and BSH_IVS_BETA.TXT data file appear to be in sync with it, except for a few issues that I was able to spot and correct. FWIW, I created an "Issue" documenting those issues in my app's repository: BabelStone Han Variation Sequences (v17.0.0 BETA) - Corrigendum , and I also updated support in my app for the *unregistered* BabelStone IVD collection accordingly, for Unicode 16.0 though. I may be wrong, but I've got the feeling that Andrew was specifically waiting for the CAAPH collection to get registered and was almost ready to officially release the current font for Unicode 17.0 in September, and possibly apply for registration of his own BabelStone collection. [Andrew C. West (??), 1960-2025 RIP (???).] --Michel Mariani -------------- next part -------------- An HTML attachment was scrubbed... URL: From public at khwilliamson.com Thu Jul 24 13:05:41 2025 From: public at khwilliamson.com (Karl Williamson) Date: Thu, 24 Jul 2025 12:05:41 -0600 Subject: How to report a defect in TUS? In-Reply-To: References: <470b6f53-f6d0-42c6-8260-99c3dba6c53e@it.aoyama.ac.jp> Message-ID: Perusing https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-3/#G54355, I noticed it refers to Unicode Technical Report #36, ?Unicode Security Considerations.? This TR is stabilized. That reference should be replaced with something current. I then went to the unicode.org home page to find how to report this. Not seeing anything obvious in the menus, I entered in the search box report defect No relevant result came up. From markus.icu at gmail.com Thu Jul 24 13:33:37 2025 From: markus.icu at gmail.com (Markus Scherer) Date: Thu, 24 Jul 2025 11:33:37 -0700 Subject: How to report a defect in TUS? In-Reply-To: References: <470b6f53-f6d0-42c6-8260-99c3dba6c53e@it.aoyama.ac.jp> Message-ID: https://www.unicode.org/reporting.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Thu Jul 24 14:52:33 2025 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Thu, 24 Jul 2025 12:52:33 -0700 Subject: How to report a defect in TUS? In-Reply-To: References: <470b6f53-f6d0-42c6-8260-99c3dba6c53e@it.aoyama.ac.jp> Message-ID: <24456901-d525-4923-97ee-ba70984732fa@ix.netcom.com> On 7/24/2025 11:05 AM, Karl Williamson via Unicode wrote: > Perusing > https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-3/#G54355, > I noticed it refers to Unicode Technical Report #36, ?Unicode Security > Considerations.?? This TR is stabilized.? That reference should be > replaced with something current. > > I then went to the unicode.org home page to find how to report this. > Not seeing anything obvious in the menus, I entered in the search box > > report defect > > No relevant result came up. > When reporting the issue, note should be made of the actual text the link attempts to cite: 3.5 Deletion of Code Points In some versions prior to Unicode 5.2, conformance clause C7 allowed the deletion of noncharacter code points: C7. When a process purports not to modify the interpretation of a valid coded character sequence, it shall make no change to that coded character sequence other than the possible replacement of character sequences by their canonical-equivalent sequences /*or the deletion of noncharacter code points*/*. * Whenever a character is invisibly deleted (instead of replaced), such as in this older version of C7, it may cause a security problem. The issue is the following: A gateway might be checking for a sensitive sequence of characters, say "delete". If what is passed in is "deXlete", where X is a noncharacter, the gateway lets it through: the sequence "deXlete" may be in and of itself harmless. However, suppose that later on, past the gateway, an internal process invisibly deletes the X. In that case, the sensitive sequence of characters is formed, and can lead to a security breach. The following is an example of how this can be used for malicious purposes. In the landing page for the stabilized TR, it says "Some material may still be useful, and may be extracted in the future for use in other specifications."? The task here cannot simply be to to delete the link, but to move the affected text into the core spec (or some other document). A defect report would be more useful if it contained a suggestion to that effect. A./ -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Fri Jul 25 00:40:03 2025 From: doug at ewellic.org (Doug Ewell) Date: Fri, 25 Jul 2025 05:40:03 +0000 Subject: U+1F1AD: SQUARED FN or MASK WORK SYMBOL? In-Reply-To: References: Message-ID: I don?t think the main, underlying point is being understood by all parties. Unicode encodes plain-text characters. Not every symbol is in common use as part of running text, except for the pathological case of a phrase or sentence talking about the symbol itself. The fact that such a symbol may appear on a computer keyboard does not disprove this. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org From: Unicode On Behalf Of Marius Spix via Unicode Sent: Thursday, July 24, 2025 1:51 To: beckiergb at gmail.com; t.b.y.taoboyu at foxmail.com Cc: unicode at corp.unicode.org; anatole.bouvierdyvoire at afnor.org Subject: Aw: Re: U+1F1AD: SQUARED FN or MASK WORK SYMBOL? I also noted, that newer keyboards include an AI assistant key, which replaces the menu key, but sends another keycode sequence than the menu key, including the F23 key, which is absent from many keyboards. I am aware, than Unicode won't encode characters protected by trademark laws like the Apple or Windows logo. The closest character which could be to represent an AI assistant is U+1F481 (INFORMATION DESK OFFICER) which is unified with the "person tipping hand" emoji. As I expect other OS vendors to include their own AI assistant keys, I wonder if a standardized AI assistant keyboard symbol, which does not depict the Microsoft Copilot symbol would be useful. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ivanpan3 at gmail.com Fri Jul 25 09:56:10 2025 From: ivanpan3 at gmail.com (Ivan Panchenko) Date: Fri, 25 Jul 2025 16:56:10 +0200 Subject: Double right arrowhead? In-Reply-To: References: <175302981091.7.6604359669503701964.816946605@sl.neatnit.net> Message-ID: Jukka K. Korpela wrote: > The use of two GREATER THAN characters is just a way to emulate a rightwards arrow using ASCII graphics. On the other hand, many UIs do have something like ?? (and I am not talking about the ASCII characters!) to point to the left/right. Not sure whether it deserves to be encoded as a Unicode character, but then again, why not when there are all sorts of different arrows? I have also seen something that looks similar to ?>>?. From gwidion at gmail.com Fri Jul 25 10:55:04 2025 From: gwidion at gmail.com (Joao S. O. Bueno) Date: Fri, 25 Jul 2025 12:55:04 -0300 Subject: Double right arrowhead? In-Reply-To: References: <175302981091.7.6604359669503701964.816946605@sl.neatnit.net> Message-ID: I just went to check, I am really surprised that what are now recognized as de facto symbols for media reproducing control had not been encoded - not even as emoji's. I was expecting to find something similar to ">>" as "Fast Forward" or similar, as one can see in every media player, physical or in software, along with the symbols for "rewind", "play", "pause". Maybe starting a pledge to encode these with semantic meaning for forward and backward (for the visual glyphs ">>" and "<<") could be a thing, indeed - since none of the tens of right-pointing arrows listed by James above seems to convey the forward meaning. On Fri, Jul 25, 2025 at 11:59?AM Ivan Panchenko via Unicode wrote: > > Jukka K. Korpela wrote: > > The use of two GREATER THAN characters is just a way to emulate a rightwards arrow using ASCII graphics. > > On the other hand, many UIs do have something like ?? (and I > am not talking about the ASCII characters!) to point to the > left/right. Not sure whether it deserves to be encoded as a Unicode > character, but then again, why not when there are all sorts of > different arrows? I have also seen something that looks similar to > ?>>?. > From gwidion at gmail.com Fri Jul 25 11:16:25 2025 From: gwidion at gmail.com (Joao S. O. Bueno) Date: Fri, 25 Jul 2025 13:16:25 -0300 Subject: Double right arrowhead? In-Reply-To: References: <175302981091.7.6604359669503701964.816946605@sl.neatnit.net> Message-ID: Ok, so I just took some time to look more carefully - and here they are: ```python % pip install terminedia from terminedia.unicode import lookup ... lookup(r"DOUBLE (SUCCEEDS|PRECEDES)") # Out[27]: # [Character(code=0x2ABB, value='?', name='DOUBLE PRECEDES', category='Sm', width='N'), # Character(code=0x2ABC, value='?', name='DOUBLE SUCCEEDS', category='Sm', width='N')] ``` On Fri, Jul 25, 2025 at 12:55?PM Joao S. O. Bueno wrote: > > I just went to check, I am really surprised that what are now > recognized as de facto > symbols for media reproducing control had not been encoded - not even > as emoji's. > > I was expecting to find something similar to ">>" as "Fast Forward" > or similar, as one can see in > every media player, physical or in software, along with the symbols > for "rewind", "play", "pause". > > Maybe starting a pledge to encode these with semantic meaning for > forward and backward (for the > visual glyphs ">>" and "<<") could be a thing, indeed - since none of > the tens of right-pointing arrows listed by James above seems > to convey the forward meaning. > > On Fri, Jul 25, 2025 at 11:59?AM Ivan Panchenko via Unicode > wrote: > > > > Jukka K. Korpela wrote: > > > The use of two GREATER THAN characters is just a way to emulate a rightwards arrow using ASCII graphics. > > > > On the other hand, many UIs do have something like ?? (and I > > am not talking about the ASCII characters!) to point to the > > left/right. Not sure whether it deserves to be encoded as a Unicode > > character, but then again, why not when there are all sorts of > > different arrows? I have also seen something that looks similar to > > ?>>?. > > From beckiergb at gmail.com Fri Jul 25 11:23:56 2025 From: beckiergb at gmail.com (Rebecca Bettencourt) Date: Fri, 25 Jul 2025 09:23:56 -0700 Subject: Double right arrowhead? In-Reply-To: References: <175302981091.7.6604359669503701964.816946605@sl.neatnit.net> Message-ID: On Fri, Jul 25, 2025 at 8:58?AM Joao S. O. Bueno via Unicode < unicode at corp.unicode.org> wrote: > I just went to check, I am really surprised that what are now > recognized as de facto > symbols for media reproducing control had not been encoded - not even > as emoji's. > U+23E9 BLACK RIGHT-POINTING DOUBLE TRIANGLE (fast forward) U+23EA BLACK LEFT-POINTING DOUBLE TRIANGLE (rewind) U+23EF BLACK RIGHT-POINTING TRIANGLE WITH DOUBLE VERTICAL BAR (play/pause) U+23F8 DOUBLE VERTICAL BAR (pause) U+23F9 BLACK SQUARE FOR STOP U+23FA BLACK CIRCLE FOR RECORD U+25B6 BLACK RIGHT-POINTING TRIANGLE (play) -- Rebecca Bettencourt -------------- next part -------------- An HTML attachment was scrubbed... URL: From gwidion at gmail.com Fri Jul 25 11:33:15 2025 From: gwidion at gmail.com (Joao S. O. Bueno) Date: Fri, 25 Jul 2025 13:33:15 -0300 Subject: Double right arrowhead? In-Reply-To: References: <175302981091.7.6604359669503701964.816946605@sl.neatnit.net> Message-ID: Oh -thank you. The reason I didn't work is that I searched the name database for the words "forward', "backward", "rewind", "play" and "pause" - and by coincidence didn't search the two related terms that are present there: "stop" and "record" On Fri, Jul 25, 2025 at 1:24?PM Rebecca Bettencourt wrote: > > On Fri, Jul 25, 2025 at 8:58?AM Joao S. O. Bueno via Unicode wrote: >> >> I just went to check, I am really surprised that what are now >> recognized as de facto >> symbols for media reproducing control had not been encoded - not even >> as emoji's. > > > U+23E9 BLACK RIGHT-POINTING DOUBLE TRIANGLE (fast forward) > U+23EA BLACK LEFT-POINTING DOUBLE TRIANGLE (rewind) > U+23EF BLACK RIGHT-POINTING TRIANGLE WITH DOUBLE VERTICAL BAR (play/pause) > U+23F8 DOUBLE VERTICAL BAR (pause) > U+23F9 BLACK SQUARE FOR STOP > U+23FA BLACK CIRCLE FOR RECORD > U+25B6 BLACK RIGHT-POINTING TRIANGLE (play) > > > -- Rebecca Bettencourt From markus.icu at gmail.com Fri Jul 25 12:05:06 2025 From: markus.icu at gmail.com (Markus Scherer) Date: Fri, 25 Jul 2025 10:05:06 -0700 Subject: Double right arrowhead? In-Reply-To: References: <175302981091.7.6604359669503701964.816946605@sl.neatnit.net> Message-ID: On Fri, Jul 25, 2025 at 9:35?AM Joao S. O. Bueno via Unicode < unicode at corp.unicode.org> wrote: > The reason I didn't work is that I searched the name database for the > words "forward', "backward", "rewind", "play" and "pause" - and by > coincidence didn't search the two related terms that are present > there: "stop" and "record" > You could search the NamesList file which is used to generate the charts, including the per-character annotations: https://www.unicode.org/Public/UCD/latest/ucd/NamesList.txt Or you could do a web search like unicode character "fast forward" and find things like https://en.wikipedia.org/wiki/Fast_forward https://en.wikipedia.org/wiki/Media_control_symbols Best regards, markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From ivanpan3 at gmail.com Fri Jul 25 13:11:54 2025 From: ivanpan3 at gmail.com (Ivan Panchenko) Date: Fri, 25 Jul 2025 20:11:54 +0200 Subject: Double right arrowhead? In-Reply-To: References: <175302981091.7.6604359669503701964.816946605@sl.neatnit.net> Message-ID: But the fast forward symbol is not what I am talking about. It should be just two simple strokes for an arrowhead (I have noticed this symbolism quite often, recently), twice, not two black triangles. The succeeds symbol is a relation symbol that is used if an object a succeeds an object b (a ? b), so this does not fit, either. (That said, is the use of the greater-than sign actually inappropriate here? Semantically ambiguous characters such as the ASCII apostrophe are a thing, after all.) From haberg_1 at icloud.com Fri Jul 25 13:19:19 2025 From: haberg_1 at icloud.com (=?utf-8?Q?Hans_=C3=85berg?=) Date: Fri, 25 Jul 2025 20:19:19 +0200 Subject: Double right arrowhead? In-Reply-To: References: Message-ID: <21C672F1-4F0A-4626-AC15-014EFCC14639@icloud.com> > On 19 Jul 2025, at 12:54, Ivan Panchenko via Unicode wrote: > > People sometimes use two greater-than signs or a right-hand guillemet > to point to the right, e.g., in ?Continue reading >>? or preceding a > text as part of a link. I wonder which Unicode character(s) would be > appropriate for this. There are various hand symbols that might be used for this: ? WHITE RIGHT POINTING INDEX U+261E ? WHITE LEFT POINTING INDEX U+261C ?? WHITE UP POINTING INDEX U+261D ? WHITE DOWN POINTING INDEX U+261F ? backhand index finger pointing right U+1F449 ? backhand index finger pointing left U+1F448 ? backhand index finger pointing up U+1F446 ? backhand index finger pointing down U+1F447 ? BLACK RIGHT POINTING INDEX U+261B ? BLACK LEFT POINTING INDEX U+261A From doug at ewellic.org Sat Jul 26 16:55:59 2025 From: doug at ewellic.org (Doug Ewell) Date: Sat, 26 Jul 2025 21:55:59 +0000 Subject: Double right arrowhead? In-Reply-To: References: <175302981091.7.6604359669503701964.816946605@sl.neatnit.net> Message-ID: If one is looking for an existing character that looks like >>, but isn't a sequence of the two characters > and >, here are some suggested steps to take before one concludes that a New Character Must Be Proposed: 1. Look in the official code charts at https://www.unicode.org/charts/ . While there are a lot of blocks to choose from, including block elements and several each for arrows and math symbols, it is probably worth scanning them all to find what is needed and save some embarrassment later. 2. Look in a character-map tool. Windows provides Character Map; Mac provides Character Viewer, some versions of Linux provide GNOME Character Map (a.k.a. Gucharmap), etc. There are also third-party tools, such as BabelPad and BabelMap (https://www.babelstone.co.uk/Software/BabelMap.html), available for a variety of platforms. One can browse through the glyphs, using whatever font(s) one prefers. (Note that for both 1 and 2, do NOT, and I mean NOT, rely on character names alone. Look at the glyphs.) 3. There is a delightful online tool called Shapecatcher at https://shapecatcher.com/ . This site provides a box where one can use one?s awesome mouse-drawing skills to draw the character one is looking for, click Recognize, and be presented with a list of existing characters that ?match,? more or less. Some of the suggestions farther down the list are clearly bogus, and can be simply ignored. For people like me who have terrible mouse-drawing skills, one strategy is to try several times to draw the thing, and then look at the collected results. Shapecatcher gave me the following suggestions for >>, along with the hallucinations: Much greater-than: ? Unicode hexadecimal: 0x226b Z notation schema piping: ? Unicode hexadecimal: 0x2a20 Right-pointing double angle quotation mark: ? Unicode hexadecimal: 0xbb Double nested greater-than: ? Unicode hexadecimal: 0x2aa2 Double succeeds: ? Unicode hexadecimal: 0x2abc which are no worse really than many of the suggestions given in this thread. If one wants ?it looks kinda like >>, but not exactly in some way,? well, then draw it. 4. One can always suck it up and continue to use <003E, 003E>. It is easy to type and pretty much universally understood. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org From alex.plantema at xs4all.nl Sun Jul 27 12:03:36 2025 From: alex.plantema at xs4all.nl (Alex Plantema) Date: Sun, 27 Jul 2025 19:03:36 +0200 Subject: Double right arrowhead? In-Reply-To: References: <175302981091.7.6604359669503701964.816946605@sl.neatnit.net> Message-ID: <645469fe-594f-4250-a4b5-1f9301a9b91f@xs4all.nl> Op za 26-07-2025 om 23:55 schreef Doug Ewell via Unicode: > If one is looking for an existing character that looks like >>, but isn't a sequence of the two characters > and >, here are some suggested steps to take before one concludes that a New Character Must Be Proposed: Characters may look useful in one font, but not in other fonts. Are you going to check them in all fonts? If a new character with a name describing its usage is introduced, we don't have that problem. -- Alex. -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Sun Jul 27 12:56:51 2025 From: doug at ewellic.org (Doug Ewell) Date: Sun, 27 Jul 2025 17:56:51 +0000 Subject: Double right arrowhead? In-Reply-To: <645469fe-594f-4250-a4b5-1f9301a9b91f@xs4all.nl> References: <175302981091.7.6604359669503701964.816946605@sl.neatnit.net> <645469fe-594f-4250-a4b5-1f9301a9b91f@xs4all.nl> Message-ID: Alex Plantema wrote: > Characters may look useful in one font, but not in other fonts. Are > you going to check them in all fonts? A character may have somewhat different appearances in different fonts, as long as the basic identity of the character is preserved. That?s what characters are: https://www.unicode.org/reports/tr17/#CharactersVsGlyphs Even emoji don?t look exactly the same in every font. Even characters in the Dingbats block don't. If you require such a specific appearance for this symbol that "check them in all fonts" is considered necessary, then you are not looking for a character; you are looking for a glyph, and for that an inline image is probably the best choice. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org From asmusf at ix.netcom.com Sun Jul 27 13:32:48 2025 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Sun, 27 Jul 2025 11:32:48 -0700 Subject: Double right arrowhead? In-Reply-To: References: <175302981091.7.6604359669503701964.816946605@sl.neatnit.net> <645469fe-594f-4250-a4b5-1f9301a9b91f@xs4all.nl> Message-ID: <67622d58-d5cf-4d1a-8398-63f9f418e741@ix.netcom.com> Symbols are distinct from letters in that the latter have a strong customary identity that maps onto a sometimes surprising range of font designs. The functional requirement for a glyph for a letter is that it is recognizable in context so that the underlying letter can be identified. Users of Fraktur fonts have no problem with a glyph for A that in isolation might be mistaken for a U by readers not familiar with that style. Even then, not all fonts are usable for all purposes. There are Latin fonts that drop the dot on the 'i'. Those can't be used for Turkish, where the dotted and dotless 'i' are distinct. However, they work fine for English, and they are not that rare. Readers who are not used to looking for typographical design quirks might not even notice. For symbols, there is usually a lot less context; they don't form part of words, for example. And the shapes are often very simple or geometric. Take a simple triangle, pointing right. Is a black (filled) one a different symbol from a white (open) one? Is an short, tall triangle shape a different symbol from a long wide one? Generally, we say yes, so symbols have a much narrower range of acceptable (expected) glyph shapes. Punctuation are somewhere in between. Whether a period is square or round doesn't matter in the context of running text. Both are equally acceptable so we typically leave that to the font. At the same time, the character is reused for any dot on the baseline, whether period or decimal point. Common to symbols and punctuation is that they can be mapped onto more than one concept; this is easier if the range of acceptable glyphs is narrower. This supports the suggestion made here to look for the intended shape, and if an existing symbol is a good or perhaps even precise match, then the suggestion would be to perhaps recognize that alternate use in an annotation (if that use is considered common and you want to guide users in making a consistent selection). Yes, it's useful to look at a couple of the most common fonts to make sure that the actually deployed range of glyphs matches the new usage. In case where some symbol unexpectedly shows an interesting variation of appearance, adding another use to it might not work. In particular, if these are not outliers, but common alternations. But unless that research has been done and there's conclusive evidence that adopting an existing symbol for that use case is unworkable, there's not even enough basis for discussing a new character proposal. That said, I'm not in favor of adopting an existing character if the expected glyph for it is only a rough approximation of a preferred shape. I totally get that not all arrowheads look the same, and that there is room therefore, for a variety of them in the standard. However, any proposal claiming that every single existing one is insufficient has the burden of demonstrating that. A./ On 7/27/2025 10:56 AM, Doug Ewell via Unicode wrote: > Alex Plantema wrote: > >> Characters may look useful in one font, but not in other fonts. Are >> you going to check them in all fonts? > A character may have somewhat different appearances in different fonts, as long as the basic identity of the character is preserved. That?s what characters are: > https://www.unicode.org/reports/tr17/#CharactersVsGlyphs > > Even emoji don?t look exactly the same in every font. Even characters in the Dingbats block don't. > > If you require such a specific appearance for this symbol that "check them in all fonts" is considered necessary, then you are not looking for a character; you are looking for a glyph, and for that an inline image is probably the best choice. > > -- > Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org > > From liste at secarica.ro Sun Jul 27 14:24:13 2025 From: liste at secarica.ro (Cristian =?UTF-8?Q?Secar=C4=83?=) Date: Sun, 27 Jul 2025 22:24:13 +0300 Subject: Double right arrowhead? In-Reply-To: <21C672F1-4F0A-4626-AC15-014EFCC14639@icloud.com> References: <21C672F1-4F0A-4626-AC15-014EFCC14639@icloud.com> Message-ID: <20250727222413.00002713@secarica.ro> ?n data de Fri, 25 Jul 2025 20:19:19 +0200, Hans ?berg via Unicode a scris: > > People sometimes use two greater-than signs or a right-hand > > guillemet to point to the right, e.g., in ?Continue reading >>? or > > preceding a text as part of a link. [...] > > There are various hand symbols that might be used for this: > > [...] > ? backhand index finger pointing right U+1F449 > ? backhand index finger pointing left U+1F448 > ? backhand index finger pointing up U+1F446 > ? backhand index finger pointing down U+1F447 > [...] Specifically for the web, there is ? for example ? Font Awesome, which has glyphs for exactly that (using PUA for its custom encoding): https://fontawesome.com/icons/hand-point-right https://fontawesome.com/icons/hand-point-left https://fontawesome.com/icons/hand-point-up https://fontawesome.com/icons/hand-point-down Cristi -- Cristian Secar? https://www.secarica.ro From mail at dzfrias.dev Tue Jul 29 12:47:54 2025 From: mail at dzfrias.dev (mail at dzfrias.dev) Date: Tue, 29 Jul 2025 10:47:54 -0700 Subject: U+0F81 Canonical Combining Class? Message-ID: The Tibetan Unicode block contains a number of characters (U+0F73, U+0F75, U+0F81) that have a canonical combining class value of zero, and have non-empty decomposition mappings. This is not out of the ordinary, but upon inspecting the code points that they map to, I found that the canonical combining class of each decomposition code point is greater than zero. In the case of U+0F81, the decomposition mapping is: U+0F71 U+0F80. Both U+0F71 and U+0F80 have canonical combining class values greater than zero, so U+0F81 decomposes solely into combining marks, yet has a canonical combining class value of zero. What is the reasoning behind this discrepancy? It is my understanding that U+0F81 (TIBETAN VOWEL SIGN REVERSED II, ?) is supposed to be a combining mark. Also, the Tibetan block is the only block that contains code points with this behavior. It is likely that I'm misunderstanding the semantics of the canonical combining class system. Diego Frias From pgcon6 at msn.com Tue Jul 29 15:02:33 2025 From: pgcon6 at msn.com (Peter Constable) Date: Tue, 29 Jul 2025 20:02:33 +0000 Subject: U+0F81 Canonical Combining Class? In-Reply-To: References: Message-ID: On the surface, this does seem confusing: it seems like it could imply that there might be an existing problem with normalization ? a sequence that should be equivalent to its NFD form but that would have marks in a different order in the NFD form. However, wrt that apparent contradiction, it's important to keep in mind that canonical combining classes are used in conjunction with Unicode normalization, and that all defined normalization forms begin with decomposition followed by canonical ordering of marks. So, for instance, consider a character sequence < 0F81, 0F84 >. The canonical combining class of 0F81, implying that nothing would re-order around that character in canonical ordering. And compare that with the equivalent decomposed sequence (using the decomposition mapping for 0F81), < 0F71, 0F80, 0F84 >. The canonical combining classes of those characters, in sequence, are < 129, 130, 9 >, and so you might expect those would canonically reorder in the order 9 < 129 < 130, hence a sequence < 0F84, 0F71, 0F80 >. Yet the sequence < 0F84, 0F71, 0F80 > clearly is not equivalent to the original sequence < 0F81, 0F84 >. The fallacy in that reasoning is the step of considering canonical ordering of the non-decomposed sequence < 0F81, 0F84 >. Canonical reordering is only ever intended to be done on fully decomposed sequences. That explains why there isn't any contradiction regarding normalization. But now to get to your question: isn't it a discrepancy to have a mark with ccc=0 decompose to a sequence of marks with ccc > 0? The only potential discrepancy that would matter would be if there were a problem with normalization. That's because canonical combining classes only have relevance in relation to normalization. And I've explained above why there isn't any such issue. So, with that in mind... Every combining mark must be assigned some canonical combining class. In this case, we're considering a mark that's a precomposed form for a sequence of marks with different combining classes, 129 and 130. If 0F81 were assigned ccc = 129, that would seem strange (and you or someone else would eventually ask for an explanation). Likewise, if 0F81 were assigned ccc = 130. The likely reason why 0F81 as assigned to class 0 is that in needed to be assigned to _some_ class and class 0 was the least strange choice. Note that 0F81 could have been assigned to _any_ canonical combining class and it would not have had any effect on normalization: The canonical combining class of a combining mark with a canonical decomposition mapping is never used! Only the ccc for characters in the fully decomposed sequence matters. Even so, I think it's fair to say that ccc = 0 is the least strange assignment for 0F81. Likewise for 0F73 and 0F75. Peter -----Original Message----- From: Unicode On Behalf Of Diego Frias via Unicode Sent: July 29, 2025 10:48 AM To: unicode at corp.unicode.org Subject: U+0F81 Canonical Combining Class? The Tibetan Unicode block contains a number of characters (U+0F73, U+0F75, U+0F81) that have a canonical combining class value of zero, and have non-empty decomposition mappings. This is not out of the ordinary, but upon inspecting the code points that they map to, I found that the canonical combining class of each decomposition code point is greater than zero. In the case of U+0F81, the decomposition mapping is: U+0F71 U+0F80. Both U+0F71 and U+0F80 have canonical combining class values greater than zero, so U+0F81 decomposes solely into combining marks, yet has a canonical combining class value of zero. What is the reasoning behind this discrepancy? It is my understanding that U+0F81 (TIBETAN VOWEL SIGN REVERSED II, ?) is supposed to be a combining mark. Also, the Tibetan block is the only block that contains code points with this behavior. It is likely that I'm misunderstanding the semantics of the canonical combining class system. Diego Frias