From kent.b.karlsson at bahnhof.se Tue May 16 18:06:35 2023 From: kent.b.karlsson at bahnhof.se (Kent Karlsson) Date: Wed, 17 May 2023 01:06:35 +0200 Subject: Missing Latin superscript lowercase letters In-Reply-To: References: <1317264669.2762677.1678192549210@email.ionos.de> <40E26F97-2412-4BB6-8056-A628D7E5200E@bahnhof.se> <2c43b2b5-8281-de1e-0948-73e419244f52@cateee.net> Message-ID: > 24 mars 2023 kl. 14:23 skrev Gabriel Tellez via Unicode : > > Mathematical typesetting and proper musical notation are not in the scope of Unicode. Math expressions are sufficiently in scope for Unicode to be dealt with in several documents published on the Unicode site (not only proposal documents, other more ?permanent? documents as well). > On Thu, Mar 23, 2023 at 11:43?AM Giacomo Catenazzi via Unicode > wrote: > > In any case, to display true maths, we need a specialised engine (and > fonts). We are far from having current shaping engines (and fonts) to > display maths in a nice way. (and personally I prefer that developers of > shaping engines will works on improving the actual engine and fonts for > human languages, before to go on such specialised field (which we have > already good tools). Just about EVERY school child (ok, only a few of them will become mathematicians, more of them will become engineers, but that is beside my point) world-wide will study some math. Requiring them to learn LaTeX or use some clumsy (!!, sorry) tool to write math expressions on computers, however, is not a future I?d like to see. It should be light-weight, usable just about anywhere, for just about any level of math (from children?s school classes up to university undergraduate level math and maybe even beyond) and any math editing tool should be easy to use. In my proposal for new representation formats (inter-equivalent) for math expressions I have seen to: Handle combining characters correctly (they apply to a combining sequence preceding it (modulo canonical equivalence), NEVER to a math expression). Allow for easy handling of multi-letter variables (identifiers) (among the other, existing, formats, it seems to only be (La)TeX that can handle that fairly well). Handle bidi correctly and reliably, leaving the math expression AUTHOR in charge of which subexpression goes where, and which direction arrows and other math symbols go (the display system must not mess with that). Handle math styles correctly, esp. when it comes to being able to handle multiletter variables but also in general (the ?MATHEMATICAL? characters are esp. problematic and must be forbidden). Have an XML representation that is 100% compatible with the non-XML representations, and still does not suffer from ?tag bloat?. Have several representations suitable for different representation contexts: XML, control codes (e.g. ECMA-48 formatting), ?mark-down?. Have the representation schemes be deeply integrateable with their representation context. ? So, ok, that was a big plug for my math expression representation proposal. I don?t have a huge organisation behind me, so I?m taking this opportunity to promote it? Read all about it in https://github.com/kent-karlsson/control/blob/main/math-layout-controls-2023-A.pdf . Sorry for it being all of 61 pages long (and the markdown is given on page 61?). /Kent K -------------- next part -------------- An HTML attachment was scrubbed... URL: From rick at corp.unicode.org Tue May 23 11:56:50 2023 From: rick at corp.unicode.org (Rick McGowan) Date: Tue, 23 May 2023 09:56:50 -0700 Subject: Unicode 15.1 Beta Review begins Message-ID: Hello everyone, This is to let you know... The #beta review period has begun for #Unicode Version 15.1, slated for release later this year. https://blog.unicode.org/2023/05/unicode-151-beta-review-open.html Cheers, R. From admin at genome.arizona.edu Thu May 25 10:52:41 2023 From: admin at genome.arizona.edu (admin at genome.arizona.edu) Date: Thu, 25 May 2023 08:52:41 -0700 Subject: Why missing characters and empty code points? Message-ID: <3247b5ed-6c96-272d-91a0-8b7dc96ed366@genome.arizona.edu> For example, why is MATHEMATICAL SCRIPT SMALL O missing and the assumed code point it would have, 1D4C4, is empty?? Yet 1D4C3 (MATHEMATICAL SCRIPT SMALL N) and 1D4C5 (MATHEMATICAL SCRIPT SMALL P) are defined.? Makes no sense.? Thanks From doug at ewellic.org Thu May 25 11:20:15 2023 From: doug at ewellic.org (Doug Ewell) Date: Thu, 25 May 2023 16:20:15 +0000 Subject: Why missing characters and empty code points? In-Reply-To: <3247b5ed-6c96-272d-91a0-8b7dc96ed366@genome.arizona.edu> References: <3247b5ed-6c96-272d-91a0-8b7dc96ed366@genome.arizona.edu> Message-ID: ?admin? wrote: > For example, why is MATHEMATICAL SCRIPT SMALL O missing and the > assumed code point it would have, 1D4C4, is empty? Yet 1D4C3 > (MATHEMATICAL SCRIPT SMALL N) and 1D4C5 (MATHEMATICAL SCRIPT SMALL P) > are defined. Makes no sense. Thanks It makes sense if you take a look at the nameslist file, or the text immediately adjacent to the code charts, or any number of other sources. There you will see that U+2134 SCRIPT SMALL O exists, which is why a duplicate of this character was not encoded at 0x1D4C4. | 1D4C4 | x (script small o - 2134) Duplicate characters are generally not encoded simply to fill holes in the coding space. The question of ?what is a duplicate? becomes complex, especially for those new to the Unicode/10646 character identification process, partly because some duplicates or near-duplicates do exist for legacy compatibility purposes, and partly because lookalike characters in different scripts (such as Latin A and Greek ? and Cyrillic ?) are correctly not unified. You began your post with ?For example.? Please check the sources mentioned (or ask if you don?t know how to find them) for the other characters you feel are missing, and then check back in if you have additional questions. Thanks, -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org From jameskass at code2001.com Thu May 25 11:21:23 2023 From: jameskass at code2001.com (James Kass) Date: Thu, 25 May 2023 16:21:23 +0000 Subject: Why missing characters and empty code points? In-Reply-To: <3247b5ed-6c96-272d-91a0-8b7dc96ed366@genome.arizona.edu> References: <3247b5ed-6c96-272d-91a0-8b7dc96ed366@genome.arizona.edu> Message-ID: On 2023-05-25 3:52 PM, admin--- via Unicode wrote: > For example, why is MATHEMATICAL SCRIPT SMALL O missing and the assumed > code point it would have, 1D4C4, is empty?? Yet 1D4C3 (MATHEMATICAL > SCRIPT SMALL N) and 1D4C5 (MATHEMATICAL SCRIPT SMALL P) are defined. > Makes no sense.? Thanks U+2134 SCRIPT SMALL O This character was already encoded before the math letters were added.? The empty 1D4C4 preserves the expected and relative ordering with the other math character additions. From admin at genome.arizona.edu Thu May 25 11:39:45 2023 From: admin at genome.arizona.edu (admin at genome.arizona.edu) Date: Thu, 25 May 2023 09:39:45 -0700 Subject: [EXT]Re: Why missing characters and empty code points? In-Reply-To: References: <3247b5ed-6c96-272d-91a0-8b7dc96ed366@genome.arizona.edu> Message-ID: <3d1a6fde-6019-e763-ff91-d21b0a3f0fd5@genome.arizona.edu> Thanks guys! Yes the listing of characters I was looking at did not include a reference to the code point of the originally-created character, 2134 in this case. I'll be sure to first check the official charts on unicode.org in the future. Amazing this and other previously-created characters have not been duplicated, how on earth can you keep track of it all? Wonder if it is or could be possible in the future to use links in the charts, in the same way file systems do? That way the same character would show up in 2134 and 1D4C4 but only exist physically in one space... Best, Chandler From jameskass at code2001.com Thu May 25 12:14:48 2023 From: jameskass at code2001.com (James Kass) Date: Thu, 25 May 2023 17:14:48 +0000 Subject: Why missing characters and empty code points? In-Reply-To: <3d1a6fde-6019-e763-ff91-d21b0a3f0fd5@genome.arizona.edu> References: <3247b5ed-6c96-272d-91a0-8b7dc96ed366@genome.arizona.edu> <3d1a6fde-6019-e763-ff91-d21b0a3f0fd5@genome.arizona.edu> Message-ID: <0b7d7550-f8f0-b4fe-1773-dd476ee240fc@code2001.com> On 2023-05-25 4:39 PM, admin--- via Unicode wrote: > Thanks guys! Yes the listing of characters I was looking at did not > include a reference to the code point of the originally-created > character, 2134 in this case. I'll be sure to first check the official > charts on unicode.org in the future. Please have a look at BabelPad, a freeware Unicode plain text editor. https://www.babelstone.co.uk/Software/BabelPad.html With its built-in character map (BabelMap), it is a powerful and helpful tool.? For example, entering the code point to the character map takes the user to that specific code point and displays its range.? Then, if the user clicks on the question mark button, the character's properties, history, notes, aliases, and cross-references are displayed.? (BabelMap is also available as a stand-alone application from the same web site.) > Amazing this and other previously-created characters have not been > duplicated, how on earth can you keep track of it all? The architects of the Unicode Standard and many of its participants share a commitment to excellence.? (I also suspect that several of the people involved are "blessed" with OCD.? Sometimes being a finicky nit-picker can be advantageous!)? Approving new characters isn't a rubber stamping operation and there's a vigorous vetting system. From doug at ewellic.org Thu May 25 12:24:38 2023 From: doug at ewellic.org (Doug Ewell) Date: Thu, 25 May 2023 17:24:38 +0000 Subject: [EXT]Re: Why missing characters and empty code points? In-Reply-To: <3d1a6fde-6019-e763-ff91-d21b0a3f0fd5@genome.arizona.edu> References: <3247b5ed-6c96-272d-91a0-8b7dc96ed366@genome.arizona.edu> <3d1a6fde-6019-e763-ff91-d21b0a3f0fd5@genome.arizona.edu> Message-ID: Chandler wrote: > Amazing this and other previously-created characters have not been > duplicated, how on earth can you keep track of it all? That?s what the files I mentioned are for. > Wonder if it is or could be possible in the future to use links in the > charts, in the same way file systems do? That way the same character > would show up in 2134 and 1D4C4 but only exist physically in one > space... That would not be a good idea. File systems allow you to create a link ?bar? to an existing file ?foo? so that you can refer to the file as either ?foo? or ?bar? and everything will just work. There is no harm in this and it is a good thing. By contrast, creating the appearance that a character actually encoded at 2134 also exists at 1D4C4 would mislead people into thinking they could represent it in text as either 2134 or 1D4C4, and everything would just work. It would not, and promoting this kind of misinformation would harm stability of both the text and the standard. What you probably are looking for is some sort of interactive code chart, such that when you hover over a reserved cell, you can see applicable cross-reference information. But the Unicode organization has tried very hard to get people to understand that Unicode is more than just the code charts. You would be better served, if you have questions about how things work in Unicode or why there are ?missing characters and empty code points,? to make yourself familiar with the Unicode Character Database (https://www.unicode.org/reports/tr44/). As John said, if you are using Windows you can also take advantage of the wonderful BabelPad editor. There is hardly a day that I sit down at a PC and don?t open it at least once. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org From jameskass at code2001.com Thu May 25 14:33:37 2023 From: jameskass at code2001.com (James Kass) Date: Thu, 25 May 2023 19:33:37 +0000 Subject: Why missing characters and empty code points? In-Reply-To: References: <3247b5ed-6c96-272d-91a0-8b7dc96ed366@genome.arizona.edu> <3d1a6fde-6019-e763-ff91-d21b0a3f0fd5@genome.arizona.edu> Message-ID: On 2023-05-25 5:24 PM, Doug Ewell via Unicode wrote: > What you probably are looking for is some sort of interactive code chart, such that when you hover over a reserved cell, you can see applicable cross-reference information. But the Unicode organization has tried very hard to get people to understand that Unicode is more than just the code charts. You would be better served, if you have questions about how things work in Unicode or why there are ?missing characters and empty code points,? to make yourself familiar with the Unicode Character Database (https://www.unicode.org/reports/tr44/). Practically every discipline has its own jargon and backstory, including Unicode and genomics.? This can be rather daunting to any newcomer. If I was tasked with sequencing Brassocattleyas Calypso, I'd be both lost at sea and up a creek, in spite of the incongruity.? Who should be consulted for that?? A metallurgist, a cowboy, a musician, or a florist?? Fortunately we have on-line discussion groups to point us in the right direction. I suspect that the field of genomics employs needed Unicode characters in order to store and exchange data in a standard text encoding format.? As such, experts in that field would need to have access to the proper tools, but the nitty-gritty nuts-and-bolts of Unicode might not be apparent. As Doug Ewell pointed out, Unicode is far more than its code charts.? This really needs to be emphasized whenever an opportunity arises. From admin at genome.arizona.edu Thu May 25 15:03:06 2023 From: admin at genome.arizona.edu (admin at genome.arizona.edu) Date: Thu, 25 May 2023 13:03:06 -0700 Subject: [EXT]Re: Why missing characters and empty code points? In-Reply-To: <0b7d7550-f8f0-b4fe-1773-dd476ee240fc@code2001.com> References: <3247b5ed-6c96-272d-91a0-8b7dc96ed366@genome.arizona.edu> <3d1a6fde-6019-e763-ff91-d21b0a3f0fd5@genome.arizona.edu> <0b7d7550-f8f0-b4fe-1773-dd476ee240fc@code2001.com> Message-ID: James Kass via Unicode wrote on 5/25/23 10:14?AM: > Please have a look at BabelPad, a freeware Unicode plain text editor. Just a Linux user here, so guess I'll bookmark the online version for now. I've been using the built in Gnome Character Map and I just messed around with the settings. The view I had it in was "by script" and in that view it doesn't show the missing code points such as 1D4C4. However, if I change the view to "by Unicode Block" then it does show 1D4C4 as an empty space. If I click on that and open the "Character details" tab, then it shows, among other info, this: See also: ? ? U+2134 SCRIPT SMALL O complete with a hyperlink to U+2134, too. Success! From gwidion at gmail.com Thu May 25 15:08:46 2023 From: gwidion at gmail.com (Joao S. O. Bueno) Date: Thu, 25 May 2023 17:08:46 -0300 Subject: [EXT]Re: Why missing characters and empty code points? In-Reply-To: <3d1a6fde-6019-e763-ff91-d21b0a3f0fd5@genome.arizona.edu> References: <3247b5ed-6c96-272d-91a0-8b7dc96ed366@genome.arizona.edu> <3d1a6fde-6019-e763-ff91-d21b0a3f0fd5@genome.arizona.edu> Message-ID: I think linking charts is not the idea of Unicode - but higher level layers that would allow one to leverage all the letters are ok. I happen to be the author of a Python library which has, as one of its ains, make it easy to use alternate character sets on the terminal - "terminedia" - the full set of superscript latin letters can be used as such: import terminedia as tm sc = tm.Screen() sc.print_at((0,0), "Hello World!", effects=tm.Effects.super_script) Feel free to add issues to the project for character-transforms and pseudo transports you find lacking (https://github.com/jsbueno/terminedia/issues ) On Thu, May 25, 2023 at 2:00?PM admin--- via Unicode < unicode at corp.unicode.org> wrote: > Thanks guys! Yes the listing of characters I was looking at did not > include a reference to the code point of the originally-created > character, 2134 in this case. I'll be sure to first check the official > charts on unicode.org in the future. > > Amazing this and other previously-created characters have not been > duplicated, how on earth can you keep track of it all? > > Wonder if it is or could be possible in the future to use links in the > charts, in the same way file systems do? That way the same character > would show up in 2134 and 1D4C4 but only exist physically in one space... > > Best, > Chandler > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jameskass at code2001.com Fri May 26 17:42:33 2023 From: jameskass at code2001.com (James Kass) Date: Fri, 26 May 2023 22:42:33 +0000 Subject: CJK Extension I question Message-ID: A character from the beta review chart at: https://www.unicode.org/charts/PDF/Unicode-15.1/U151-2EBF0.pdf U+2EE31 ? ??? This character?s glyph shows the traditional horse radical rather than the simplified.? (?/?)? Since this is a submission from China, it was surprising to see the traditional form used.? Is this glyph correct? From jameskass at code2001.com Fri May 26 19:34:07 2023 From: jameskass at code2001.com (James Kass) Date: Sat, 27 May 2023 00:34:07 +0000 Subject: CJK Extension I question In-Reply-To: References: Message-ID: On 2023-05-26 10:42 PM, James Kass via Unicode wrote: > > A character from the beta review chart at: > https://www.unicode.org/charts/PDF/Unicode-15.1/U151-2EBF0.pdf > > U+2EE31 ? ??? > > This character?s glyph shows the traditional horse radical rather than > the simplified.? (?/?)? Since this is a submission from China, it was > surprising to see the traditional form used.? Is this glyph correct? > > Looks as if there are several other places where traditional forms are used in Extension I, so the glyph is probably correct.? My impression that China only uses simplified forms in new characters appears to be wrong. From jk at koremail.com Fri May 26 22:44:04 2023 From: jk at koremail.com (jk at koremail.com) Date: Sat, 27 May 2023 11:44:04 +0800 Subject: CJK Extension I question In-Reply-To: References: Message-ID: <26eb3ef4d024547e8f30ff6cb9d9c0e3@koremail.com> Dear James whilst in everyday life the simplified forms are usually used, there are some cases where the traditional forms are used. As to Ext I, this glyphwiki page http://en.glyphwiki.org/wiki/Group:GB18030-2022%e3%83%89%e3%83%a9%e3%83%95%e3%83%88 has quite a lot of useful analysis, including noting ??? that is the traditional form of ? U+322E2 (IRG source UK-10500). Regards John On 2023-05-27 08:34, James Kass via Unicode wrote: > On 2023-05-26 10:42 PM, James Kass via Unicode wrote: >> >> A character from the beta review chart at: >> https://www.unicode.org/charts/PDF/Unicode-15.1/U151-2EBF0.pdf >> >> U+2EE31 ? ??? >> >> This character?s glyph shows the traditional horse radical rather than >> the simplified.? (?/?)? Since this is a submission from China, it was >> surprising to see the traditional form used.? Is this glyph correct? >> >> > Looks as if there are several other places where traditional forms are > used in Extension I, so the glyph is probably correct.? My impression > that China only uses simplified forms in new characters appears to be > wrong. From jameskass at code2001.com Sat May 27 12:50:58 2023 From: jameskass at code2001.com (James Kass) Date: Sat, 27 May 2023 17:50:58 +0000 Subject: CJK Extension I question In-Reply-To: <26eb3ef4d024547e8f30ff6cb9d9c0e3@koremail.com> References: <26eb3ef4d024547e8f30ff6cb9d9c0e3@koremail.com> Message-ID: <9902506a-6553-e2af-49f6-58077fcf165f@code2001.com> On 2023-05-27 3:44 AM, John Knightley via Unicode wrote: > As to Ext I, this glyphwiki page > http://en.glyphwiki.org/wiki/Group:GB18030-2022%e3%83%89%e3%83%a9%e3%83%95%e3%83%88 > has quite a lot of useful analysis, including noting? ??? that is the > traditional form of ? U+322E2 (IRG source UK-10500). Thank you for the link, it is indeed helpful.? It's interesting to see that several of the proposed Extension I characters were postponed from earlier proposals pending new evidence. (It's pleasing that Extension I characters are composed of recognizable components, unlike some of the other recent CJK extensions.)