From christoph.paeper at crissov.de Mon Aug 7 09:58:37 2023 From: christoph.paeper at crissov.de (=?utf-8?Q?Christoph_P=C3=A4per?=) Date: Mon, 7 Aug 2023 16:58:37 +0200 Subject: Squared T-shirt sizes Message-ID: Dear Unicoders I was almost sure that I had seen squared XL and XS as ideographic legacy characters in the code charts before, but I can?t find them (e.g. in U+1F1xy or U+33xy), so I?m probably slightly delusional. XL is not included as a Number Form for the roman numeral of 40 (~ U+216x). 1. Did I miss anything? 2. Are such Latin size labeling characters (also LL or SS from JIS L 4004/4005) written within a single ideographic square in East Asia? 3. Could they be added to the standard without any such prior use? Cheers, Christoph P?per From wjgo_10009 at btinternet.com Tue Aug 1 09:16:41 2023 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Tue, 1 Aug 2023 15:16:41 +0100 (BST) Subject: Expressing any Unicode character using Morse code Message-ID: https://punster.me/serif/viewtopic.php?id=455 William Overington Tuesday 1 August 2023 -------------- next part -------------- An HTML attachment was scrubbed... URL: From beckiergb at gmail.com Mon Aug 7 15:49:30 2023 From: beckiergb at gmail.com (Rebecca Bettencourt) Date: Mon, 7 Aug 2023 13:49:30 -0700 Subject: Squared T-shirt sizes In-Reply-To: References: Message-ID: There is a U+1F14D ? SQUARED SS but none of the other characters you mentioned currently exist. -- Rebecca Bettencourt On Mon, Aug 7, 2023 at 8:04?AM Christoph P?per via Unicode < unicode at corp.unicode.org> wrote: > Dear Unicoders > > I was almost sure that I had seen squared XL and XS as ideographic legacy > characters in the code charts before, but I can?t find them (e.g. in > U+1F1xy or U+33xy), so I?m probably slightly delusional. XL is not included > as a Number Form for the roman numeral of 40 (~ U+216x). > > 1. Did I miss anything? > 2. Are such Latin size labeling characters (also LL or SS from JIS L > 4004/4005) written within a single ideographic square in East Asia? > 3. Could they be added to the standard without any such prior use? > > Cheers, > > Christoph P?per > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sosipiuk at gmail.com Mon Aug 7 16:51:32 2023 From: sosipiuk at gmail.com (=?UTF-8?Q?S=C5=82awomir_Osipiuk?=) Date: Mon, 07 Aug 2023 21:51:32 +0000 Subject: Expressing any Unicode character using Morse code In-Reply-To: References: Message-ID: <1691442167308.2261732728.4185916792@gmail.com> Compactness is of great benefit in Morse Code. I would therefore recommend against any padding or necessitating any additional character to specify length, or indeed worrying about "metadata precision" generally. For the same reason I would also use some flavour of base32 (I prefer Cockford's over the RFC, though that detail doesn't matter so much). This allows all planes except 16 to be encoded using only 4 Morse letters in the sequence. The fundamental idea of a "unicode character introducer" sequence is solid. In the spirit of Morse shorthand, I recommend a simple concatenation of "U" and "+", that is the sequnce "..-.-.-." treated as a single letter, without spaces. This would be followed by the base32 sequence, made as short as possible, and terminated with a word-space. Thus we have: ? (U+7FBD): ..-.-.-. --.. -..- -..- (U?ZXX) ? (U+1FAE5): ..-.-.-. ...-- -.-- --.- ..... (U?3YQ5) Hopefully I did not mess those examples up, but I think the point gets across regardless. In most cases, the ambiguity of whether the terminating word-space should be read as a word-space or letter-space (i.e. the current word continues following the unicode character) can be determined contextually. However, if absolutely necessary, another plus sign can be added to the sequence indicating word-continuation (i.e. the terminating space should be read as a letter-space). Cheers, S?awomir Osipiuk On Tuesday, 01 August 2023, 10:16:41 (-04:00), William_J_G Overington via Unicode wrote: https://punster.me/serif/viewtopic.php?id=455 William Overington Tuesday 1 August 2023 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at kli.org Mon Aug 7 17:12:14 2023 From: mark at kli.org (Mark E. Shoulson) Date: Mon, 7 Aug 2023 18:12:14 -0400 Subject: Expressing any Unicode character using Morse code In-Reply-To: <1691442167308.2261732728.4185916792@gmail.com> References: <1691442167308.2261732728.4185916792@gmail.com> Message-ID: Yes, compactness is of great benefit in Morse Code.? To such an extent that I find myself thinking that "expressing any Unicode character" and "Morse Code" are somewhat at odds with one another. https://en.wikipedia.org/wiki/Morse_code_for_non-Latin_alphabets speaks of non-Latin alphabets using their own encodings of many of the same dot-dash sequences that ASCII uses for non-ASCII characters.? I guess in a way it's rather like the old ISO-8859 code pages: you use the same bit sequences but they mean different things depending on what alphabet you're speaking.? A big part of Unicode's purpose was precisely to supplant ISO-8859 (right?) so that each character could stand on its own and not have to have code-page metadata attached to it. I can sort of see some logic to allowing the same for Morse Code, but again, Morse Code needs its compactness and needs to be short enough for humans to send and receive.? The "code-page" approach sounds eminently practical and usable for most purposes for Morse Code.? Still, what you're talking about is some kind of "unicode escape sequence" that you can use for one-off insertions of a character here and there (one hopes), and I can see some utility to that.? But who gets to decide how that's done?? Unicode doesn't control International Morse Code.? Probably you need to take this up with the International Telecommunication Union to make it official, or else find a bunch of Morse Code enthusiasts who'll use it unofficially until it becomes a de facto standard. Note that there are already Chinese and Japanese telegraph codes (about which I know nothing, but Wikipedia does), so there are already Morse Codes that have to represent largish character sets. ~mark On 8/7/23 17:51, S?awomir Osipiuk via Unicode wrote: > Compactness is of great benefit in Morse Code. ... > On Tuesday, 01 August 2023, 10:16:41 (-04:00), William_J_G Overington > via Unicode wrote: > > https://punster.me/serif/viewtopic.php?id=455 > > > William Overington > > > Tuesday 1 August 2023 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at ewellic.org Mon Aug 7 18:40:07 2023 From: doug at ewellic.org (Doug Ewell) Date: Mon, 7 Aug 2023 23:40:07 +0000 Subject: Squared T-shirt sizes In-Reply-To: References: Message-ID: JIS L 4004:2001 seems to refer to other two-letter size codes (PB, SA, SB, MY, MA, etc.) and does not enclose them in a square. Encoding these as squared symbols would probably require substantial evidence that the symbols are already in use. The same would be true for the English S, M, L, XL... and any codes additionally specified in EN 13402. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org -----Original Message----- From: Unicode On Behalf Of Christoph P?per via Unicode Sent: Monday, August 7, 2023 8:59 To: via Unicode Subject: Squared T-shirt sizes Dear Unicoders I was almost sure that I had seen squared XL and XS as ideographic legacy characters in the code charts before, but I can?t find them (e.g. in U+1F1xy or U+33xy), so I?m probably slightly delusional. XL is not included as a Number Form for the roman numeral of 40 (~ U+216x). 1. Did I miss anything? 2. Are such Latin size labeling characters (also LL or SS from JIS L 4004/4005) written within a single ideographic square in East Asia? 3. Could they be added to the standard without any such prior use? Cheers, Christoph P?per From textexin at xencraft.com Tue Aug 8 20:07:55 2023 From: textexin at xencraft.com (Tex) Date: Tue, 8 Aug 2023 18:07:55 -0700 Subject: =?UTF-8?Q?World=E2=80=99s_Indigenous_Peoples_Day_i?= =?UTF-8?Q?s_tomorrow_Aug_9?= Message-ID: <002701d9ca5d$ee4e52e0$caeaf8a0$@xencraft.com> Members of this group may be interested in this online event produced by TranslationCommons.org, tomorrow Aug 9, highlighting Indigenous languages of Asia, in celebration of World?s Indigenous Peoples Day. Presentations will be from India, Indonesia, Malaysia, Nepal, Bangladesh and other parts of Asia. They will be providing cultural performances, case studies and current information on several communities including Tamang, Munda Tribe, Ho Tribe, Chakma, Koya, Sunawar, and others. The speakers will be addressing translation, preservation and revitalization of Asian Indigenous languages. Join us via YouTube starting at 7am PST, 2pm UTC, 19:30 IST. https://www.youtube.com/watch?v=uAGh5fRfOuM See the program and other details at https://translationcommons.org/international-day-of-the-worlds-indigenous-peoples/ Tex -------------- next part -------------- An HTML attachment was scrubbed... URL: From jameskass at code2001.com Tue Aug 22 20:01:19 2023 From: jameskass at code2001.com (James Kass) Date: Wed, 23 Aug 2023 01:01:19 +0000 Subject: Unicode 15.1 repertoire Message-ID: Unicode 15.1 is scheduled to be released September twelfth. According to this page, https://www.unicode.org/charts/PDF/Unicode-15.1/ ... CJK Extension I adds 622 new characters. But according to the chart linked from that page, https://www.unicode.org/charts/PDF/Unicode-15.1/U151-2EBF0.pdf ... there are only 603 additional characters. Which one is correct?? If the 622 figure is right, is there a chart showing the additional 19 characters? From markus.icu at gmail.com Tue Aug 22 20:16:00 2023 From: markus.icu at gmail.com (Markus Scherer) Date: Tue, 22 Aug 2023 18:16:00 -0700 Subject: Unicode 15.1 repertoire In-Reply-To: References: Message-ID: Hi James, Extension I was expanded from 603 characters to 622 as a result of decisions in UTC #176 based on https://www.unicode.org/L2/L2023/23114r-unc-extension-i.pdf https://www.unicode.org/L2/L2023/23163-cjk-unihan-group-utc176.pdf Best regards, markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From jameskass at code2001.com Wed Aug 23 07:31:04 2023 From: jameskass at code2001.com (James Kass) Date: Wed, 23 Aug 2023 12:31:04 +0000 Subject: Unicode 15.1 repertoire In-Reply-To: References: Message-ID: <6ac500d7-ef5e-ef1d-d279-c94d48dffb23@code2001.com> Hi Markus, Thank you for the updated information. It's too bad that the "GIDC23" numbers were revised because renumbering from the repertoire presented during the beta review period would have been simpler.? Although the metadata.txt file (found in the links you sent) has a cross-reference to an apparently earlier Unicode proposal, it is not the same repertoire as the beta review charts and data. Best regards, James On 2023-08-23 1:16 AM, Markus Scherer via Unicode wrote: > Hi James, > > Extension I was expanded from 603 characters to 622 as a result of > decisions in UTC #176 > based on > https://www.unicode.org/L2/L2023/23114r-unc-extension-i.pdf > https://www.unicode.org/L2/L2023/23163-cjk-unihan-group-utc176.pdf > > Best regards, > markus From jameskass at code2001.com Wed Aug 23 21:46:15 2023 From: jameskass at code2001.com (James Kass) Date: Thu, 24 Aug 2023 02:46:15 +0000 Subject: Unicode 15.1 repertoire In-Reply-To: <6ac500d7-ef5e-ef1d-d279-c94d48dffb23@code2001.com> References: <6ac500d7-ef5e-ef1d-d279-c94d48dffb23@code2001.com> Message-ID: Only one minor anomaly spotted in the new data.? A discrepancy between an IDS and its corresponding chart glyph. U+2EE3B ???? ?? IDS from metadata.txt ???? ?? chart glyph Hope this is helpful, not sure where to report it. From markus.icu at gmail.com Wed Aug 23 23:46:56 2023 From: markus.icu at gmail.com (Markus Scherer) Date: Wed, 23 Aug 2023 21:46:56 -0700 Subject: Unicode 15.1 repertoire In-Reply-To: References: <6ac500d7-ef5e-ef1d-d279-c94d48dffb23@code2001.com> Message-ID: On Wed, Aug 23, 2023, 19:49 James Kass via Unicode wrote: > Only one minor anomaly spotted in the new data. A discrepancy between > an IDS and its corresponding chart glyph. > I passed this along and was told that this is a known issue. Thanks, markus > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From markus.icu at gmail.com Tue Aug 29 18:29:49 2023 From: markus.icu at gmail.com (Markus Scherer) Date: Tue, 29 Aug 2023 16:29:49 -0700 Subject: Happy 35th birthday, Unicode! Message-ID: Happy 35 years after ?Unicode 88?, and best wishes for many more years of a successful standard and what has become a standards (plural) development organization! >From the 2008 history books: https://www.unicode.org/history/20thceleb/20thceleb.html markus ICU-TC UTC-PAG -------------- next part -------------- An HTML attachment was scrubbed... URL: