From richard.wordingham at ntlworld.com Sun Jun 12 06:28:31 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sun, 12 Jun 2022 12:28:31 +0100 Subject: Tamil Brahmi Compliance for Unicode Versions 13.0 and 14.0 Message-ID: <20220612122831.578705be@JRWUBU2> I'm asking so as to get my facts straight; this is not intended as a complaint about as the standard. The issues are probably inherent in disunification. Can a rendering system support Tamil Brahmi without being told whether the text it is rendering is in Unicode 13.0 or 14.0? I think the answer is 'no', for the following reasons: (a) In Unicode 14.0, rendering U+11034 BRAHMI LETTER LLA and U+11075 BRAHMI LETTER OLD TAMIL LLA the same would be a violation of character identity. For example, Noto Sans Brahmi renders U+11034 with a Tamil Brahmi-style glyph, which is compliant with Unicode 13.0, but is a violation of character identity for Unicode 14.0. (b) Likewise with U+11046 BRAHMI VIRAMA and U+11070 BRAHMI SIGN OLD TAMIL VIRAMA, though less clearly. Rendering the former as a pulli is compliant with Unicode 13.0, but not with Unicode 14.0. (c) Interpreting the four Unicode 13.0 vowel sequences as calling for a pulli-like element in the rendering does not appear to respect the Unicode 14.0 character identity of U+11046. HarfBuzz at least no longer has a problem with vowel + virama/stacker sequences, which are irremovable elements of the Sinhala script (two canonical decompositions) and early documented features of the Khmer and Tai Tham scripts, though only in the first is it part of the vowel symbol. These combinations could have grandfathered (compare Malayalam chillus), but haven't been. I think this is a case where having a non-compliant rendering system would be the right thing to do. Fortunately for me, the data in Tamil Brahmi that I am concerned with is mostly tagged as being in Old Tamil. Have I got all this right? Richard. From nico.schloemer at gmail.com Tue Jun 14 14:56:36 2022 From: nico.schloemer at gmail.com (=?UTF-8?Q?Nico_Schl=C3=B6mer?=) Date: Tue, 14 Jun 2022 21:56:36 +0200 Subject: normalization: dotless i + COMBINING ACUTE ACCENT doesn't combine to I ACUTE Message-ID: Hi everyone, I was wondering about Unicode normalization with the dotless i/j characters. In Python (and all other implementations I've checked), i + COMBINING ACUTE ACCENT combine to LATIN SMALL LETTER I WITH ACUTE ``` from unicodedata import normalize normalize("NFC", "i\N{COMBINING ACUTE ACCENT}").encode("ascii", "namereplace") ``` ``` b'\\N{LATIN SMALL LETTER I WITH ACUTE}' ``` When doing the same with a dotless i, it does _not_ combine: ``` from unicodedata import normalize normalize("NFC", "\N{LATIN SMALL LETTER DOTLESS I}\N{COMBINING ACUTE ACCENT}").encode("ascii", "namereplace") ``` ``` b'\\N{LATIN SMALL LETTER DOTLESS I}\\N{COMBINING ACUTE ACCENT}' ``` Is this consistent with the standard, and oversight in the standard, or intended? Perhaps someone here can shed some light on it. See also this stackoverflow request [1] and this Python bug report [2]. Cheers, Nico [1] https://stackoverflow.com/q/72608183/353337 [2] https://github.com/python/cpython/issues/93767 From richard.wordingham at ntlworld.com Tue Jun 14 16:39:22 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Tue, 14 Jun 2022 22:39:22 +0100 Subject: normalization: dotless i + COMBINING ACUTE ACCENT doesn't combine to I ACUTE In-Reply-To: References: Message-ID: <20220614223922.6d95599f@JRWUBU2> On Tue, 14 Jun 2022 21:56:36 +0200 Nico Schl?mer via Unicode wrote: > Hi everyone, > > I was wondering about Unicode normalization with the dotless i/j > characters. > > In Python (and all other implementations I've checked), i + COMBINING > ACUTE ACCENT combine to LATIN SMALL LETTER I WITH ACUTE > ``` > from unicodedata import normalize > normalize("NFC", "i\N{COMBINING ACUTE ACCENT}").encode("ascii", > "namereplace") ``` > ``` > b'\\N{LATIN SMALL LETTER I WITH ACUTE}' > ``` > When doing the same with a dotless i, it does _not_ combine: > ``` > from unicodedata import normalize > normalize("NFC", "\N{LATIN SMALL LETTER DOTLESS I}\N{COMBINING ACUTE > ACCENT}").encode("ascii", "namereplace") > ``` > ``` > b'\\N{LATIN SMALL LETTER DOTLESS I}\\N{COMBINING ACUTE ACCENT}' > Is this consistent with the standard, and oversight in the standard, > or intended? As intended. The two sequences should render differently in a Lithuanian locale. Richard. From doug at ewellic.org Tue Jun 14 16:59:20 2022 From: doug at ewellic.org (Doug Ewell) Date: Tue, 14 Jun 2022 21:59:20 +0000 Subject: normalization: dotless i + COMBINING ACUTE ACCENT doesn't combine to I ACUTE In-Reply-To: <20220614223922.6d95599f@JRWUBU2> References: <20220614223922.6d95599f@JRWUBU2> Message-ID: Richard Wordingham replied to Nico Schl?mer: >> Is this consistent with the standard, and oversight in the standard, >> or intended? > > As intended. The two sequences should render differently in a > Lithuanian locale. Not only that, but in general there is no guarantee that a given sequence involving a combining character will necessarily have a normalization to a precomposed character, even if there is one which "looks right." The folks on Stack Overflow have it right. This is not a "bug," an "oversight," or any sort of failure on the part of the Standard. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org From cloos at jhcloos.com Tue Jun 14 17:57:57 2022 From: cloos at jhcloos.com (James Cloos) Date: Tue, 14 Jun 2022 18:57:57 -0400 Subject: normalization: dotless i + COMBINING ACUTE ACCENT doesn't combine to I ACUTE In-Reply-To: ("Nico =?iso-8859-1?Q?Schl=F6mer?= via Unicode"'s message of "Tue, 14 Jun 2022 21:56:36 +0200") References: Message-ID: For comparicon, gnu emacs shows ?? as the two separate characters (ie side-by-side) when using mono width fonts like dejavu sans mono or go mono, but combines them when using a variable width font like noto serif. My main browser (also using hb+cairo) also combines them (using dejavu serif). Your software might do something similar. -JimC -- James Cloos OpenPGP: 0x997A9F17ED7DAEA6 From nico.schloemer at gmail.com Wed Jun 15 04:12:08 2022 From: nico.schloemer at gmail.com (=?UTF-8?Q?Nico_Schl=C3=B6mer?=) Date: Wed, 15 Jun 2022 11:12:08 +0200 Subject: normalization: dotless i + COMBINING ACUTE ACCENT doesn't combine to I ACUTE Message-ID: > As intended. The two sequences should render differently in a > Lithuanian locale. Interesting! Is it true then that the result of i + COMBINING ACUTE ACCENT which is LATIN SMALL LETTER I WITH ACUTE actually represents a latin small letter i with dot _and_ acute accent? I had always assumed that it should be dotless. Cheers, Nico From aprilop at fn.de Wed Jun 15 05:28:13 2022 From: aprilop at fn.de (Andreas Prilop) Date: Wed, 15 Jun 2022 10:28:13 +0000 Subject: =?US-ASCII?Q?Re=3A_normalization=3A_dotless_i_+_COMBINING?= =?US-ASCII?Q?_ACUTE_ACCENT_doesn=27t_combine_to_I_ACUTE?= In-Reply-To: References: Message-ID: <694512FE-C405-432E-873C-3FC7A0874E08@fn.de> On 15 June 2022, Nico Schl?mer wrote: > Is it true then that the result of i + COMBINING ACUTE ACCENT > which is LATIN SMALL LETTER I WITH ACUTE actually represents a latin > small letter i with dot _and_ acute accent? No. >I had always assumed that it should be dotless. U+0626 likewise has no dots, although U+064A does have two dots below. From moyogo at gmail.com Wed Jun 15 05:44:14 2022 From: moyogo at gmail.com (Denis Jacquerye) Date: Wed, 15 Jun 2022 12:44:14 +0200 Subject: normalization: dotless i + COMBINING ACUTE ACCENT doesn't combine to I ACUTE In-Reply-To: <694512FE-C405-432E-873C-3FC7A0874E08@fn.de> References: <694512FE-C405-432E-873C-3FC7A0874E08@fn.de> Message-ID: https://www.unicode.org/versions/Unicode14.0.0/ch07.pdf#page=6 explicitly recommends using i + overdot + accent for the forms used in the Baltic. Using the Lithuanian locale may or may not work depending on the font, the software or the language tagging, while that character sequence may or may not work depending on the font or the software which is better odds. On Wed, 15 Jun 2022 at 12:31, Andreas Prilop via Unicode < unicode at corp.unicode.org> wrote: > On 15 June 2022, Nico Schl?mer wrote: > > > Is it true then that the result of i + COMBINING ACUTE ACCENT > > which is LATIN SMALL LETTER I WITH ACUTE actually represents a latin > > small letter i with dot _and_ acute accent? > > No. > > >I had always assumed that it should be dotless. > > U+0626 likewise has no dots, although U+064A does have two dots below. > > -- Denis Moyogo Jacquerye -------------- next part -------------- An HTML attachment was scrubbed... URL: From nico.schloemer at gmail.com Wed Jun 15 05:53:09 2022 From: nico.schloemer at gmail.com (=?UTF-8?Q?Nico_Schl=C3=B6mer?=) Date: Wed, 15 Jun 2022 12:53:09 +0200 Subject: normalization: dotless i + COMBINING ACUTE ACCENT doesn't combine to I ACUTE In-Reply-To: References: <694512FE-C405-432E-873C-3FC7A0874E08@fn.de> Message-ID: Thanks Denis for the reply. Indeed, your citation says > nor are other cases of accented dotted-i equivalent to accented dotless-i (for example, i + ? ? ? + ?) This is what I was looking for. Cheers, Nico On Wed, Jun 15, 2022 at 12:47 PM Denis Jacquerye via Unicode wrote: > > https://www.unicode.org/versions/Unicode14.0.0/ch07.pdf#page=6 explicitly recommends using i + overdot + accent for the forms used in the Baltic. > Using the Lithuanian locale may or may not work depending on the font, the software or the language tagging, while that character sequence may or may not work depending on the font or the software which is better odds. > > > On Wed, 15 Jun 2022 at 12:31, Andreas Prilop via Unicode wrote: >> >> On 15 June 2022, Nico Schl?mer wrote: >> >> > Is it true then that the result of i + COMBINING ACUTE ACCENT >> > which is LATIN SMALL LETTER I WITH ACUTE actually represents a latin >> > small letter i with dot _and_ acute accent? >> >> No. >> >> >I had always assumed that it should be dotless. >> >> U+0626 likewise has no dots, although U+064A does have two dots below. >> > > > -- > Denis Moyogo Jacquerye From A.Schappo at lboro.ac.uk Tue Jun 21 07:34:03 2022 From: A.Schappo at lboro.ac.uk (Andre Schappo) Date: Tue, 21 Jun 2022 12:34:03 +0000 Subject: Diverse Internationalisation of Teacher Education Message-ID: This morning I discovered this new project: "Diverse Internationalisation of Teacher Education" (DITE) So, I consider that Teachers should be taught the basics of Unicode and Unicode programming so that they can teach students how to build software for the world and for cultures other than their own culture. As we all know, Unicode is an essential core technology for software i18n & L10n. So, I consider this project an opportunity to get Unicode on the school Computer Science & IT curricula. So, if you would like to pitch for Unicode being taught to teachers, the DITE twitter is https://twitter.com/dite_project I have started a twitter thread response to DITE at https://twitter.com/andreschappo/status/1539207010511736832 Andr? Schappo -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Tue Jun 21 15:44:37 2022 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Tue, 21 Jun 2022 21:44:37 +0100 (BST) Subject: Diverse Internationalisation of Teacher Education In-Reply-To: References: Message-ID: <70385f3e.1bbe5.18188028e85.Webtop.84@btinternet.com> Many years ago I devised a scenario to encourage people to learn how to enter words with accented characters in them even if they did not know the language. I called it The Caf? ?pfel and the idea was that text from ingredients lists from multilingual food packaging could be keyed. The Caf? ?pfel would have menus in English, French, German and the language of the musicians and singers who were performing in the caf? that evening. I had this idea of a television show series with each episode combining cookery, computing and music with actors playing the continuing characters and guest musicians and singers arriving as guest stars. Well, a Portuguese band and singer would be fairly straightforward. Once the musicians come from further afield the computing got rather more complicated! William Overington Tuesday 21 June 2022 -------------- next part -------------- An HTML attachment was scrubbed... URL: From marius.spix at web.de Wed Jun 22 01:07:18 2022 From: marius.spix at web.de (Marius Spix) Date: Wed, 22 Jun 2022 08:07:18 +0200 Subject: Aw: Re: Diverse Internationalisation of Teacher Education In-Reply-To: <70385f3e.1bbe5.18188028e85.Webtop.84@btinternet.com> References: <70385f3e.1bbe5.18188028e85.Webtop.84@btinternet.com> Message-ID: An HTML attachment was scrubbed... URL: From A.Schappo at lboro.ac.uk Thu Jun 23 03:15:15 2022 From: A.Schappo at lboro.ac.uk (Andre Schappo) Date: Thu, 23 Jun 2022 08:15:15 +0000 Subject: Diverse Internationalisation of Teacher Education In-Reply-To: References: Message-ID: Hi Peter their website https://dite.usz.edu.pl/ their facebook https://www.facebook.com/DITEproject their email dite at usz.edu.pl their linkedin https://www.linkedin.com/in/dite-project-b90ab1241/ It is not my project but I will certainly be encouraging them to encompass Unicode and other aspects of software i18n and L10n in their teacher education. Andr? Schappo ________________________________ From: CAVALLARO, Peter (pcava4) Sent: 23 June 2022 01:35 To: unicode at unicode.org ; Andre Schappo Subject: Re: Diverse Internationalisation of Teacher Education ** THIS MESSAGE ORIGINATED OUTSIDE LOUGHBOROUGH UNIVERSITY ** ** Be wary of links or attachments, especially if the email is unsolicited or you don't recognise the sender's email address. ** Hi Andr?, I am a technology teacher that reads this list and am fascinated and in awe of what Unicode does, the problem it solves and how it solves it, so it is with great delight I read about your project. I do not have a twitter account and choose not to open one. Is there another avenue to access information about the project. Regards [cid:image002.png at 01D1805E.3280D480] Peter Cavallaro HOD Science/Agriculture/Engineering Nanango State High School 54 Elk Street, Nanango QLD 4615 Phone: (07) 4171 6444 Fax: (07) 4171 6400 Email: pcava4 at eq.edu.au ________________________________ From: Unicode on behalf of Andre Schappo via Unicode Sent: Tuesday, 21 June 2022 10:34 PM To: unicode at unicode.org Subject: Diverse Internationalisation of Teacher Education This morning I discovered this new project: "Diverse Internationalisation of Teacher Education" (DITE) So, I consider that Teachers should be taught the basics of Unicode and Unicode programming so that they can teach students how to build software for the world and for cultures other than their own culture. As we all know, Unicode is an essential core technology for software i18n & L10n. So, I consider this project an opportunity to get Unicode on the school Computer Science & IT curricula. So, if you would like to pitch for Unicode being taught to teachers, the DITE twitter is https://twitter.com/dite_project I have started a twitter thread response to DITE at https://twitter.com/andreschappo/status/1539207010511736832 Andr? Schappo *************************************************************************************************** IMPORTANT: This email and any attachments may contain legally privileged, confidential or private information, and may be protected by copyright. You may only use or disclose this information if you are the intended recipient(s) and if you use it in an authorised way. No other person is allowed to use, review, alter, transmit, disclose, distribute, print or copy this email and any attachments without appropriate authorisation. If you are not the intended recipient(s) and the email was sent to you by mistake, please notify the sender immediately by return email or phone, destroy any hardcopies of this email and any attachments and delete it from your system. Any legal privilege and confidentiality attached to this email is not waived or destroyed by that mistake. The Department of Education carries out monitoring, scanning and blocking of emails and attachments sent from or to addresses within the Department of Education for the purposes of operating, protecting, maintaining and ensuring appropriate use of its computer network. It is your responsibility to ensure that this email does not contain and is not affected by computer viruses, defects or interference by third parties or replication problems (including incompatibility with your computer system). The Department of Education does not accept any responsibility for any loss or damage that may result from reliance on, or the use of, any information contained in the email and any attachments. *************************************************************************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From pcava4 at eq.edu.au Wed Jun 22 19:35:44 2022 From: pcava4 at eq.edu.au (CAVALLARO, Peter (pcava4)) Date: Thu, 23 Jun 2022 00:35:44 +0000 Subject: Diverse Internationalisation of Teacher Education In-Reply-To: References: Message-ID: Hi Andr?, I am a technology teacher that reads this list and am fascinated and in awe of what Unicode does, the problem it solves and how it solves it, so it is with great delight I read about your project. I do not have a twitter account and choose not to open one. Is there another avenue to access information about the project. Regards [cid:image002.png at 01D1805E.3280D480] Peter Cavallaro HOD Science/Agriculture/Engineering Nanango State High School 54 Elk Street, Nanango QLD 4615 Phone: (07) 4171 6444 Fax: (07) 4171 6400 Email: pcava4 at eq.edu.au ________________________________ From: Unicode on behalf of Andre Schappo via Unicode Sent: Tuesday, 21 June 2022 10:34 PM To: unicode at unicode.org Subject: Diverse Internationalisation of Teacher Education This morning I discovered this new project: "Diverse Internationalisation of Teacher Education" (DITE) So, I consider that Teachers should be taught the basics of Unicode and Unicode programming so that they can teach students how to build software for the world and for cultures other than their own culture. As we all know, Unicode is an essential core technology for software i18n & L10n. So, I consider this project an opportunity to get Unicode on the school Computer Science & IT curricula. So, if you would like to pitch for Unicode being taught to teachers, the DITE twitter is https://twitter.com/dite_project I have started a twitter thread response to DITE at https://twitter.com/andreschappo/status/1539207010511736832 Andr? Schappo *************************************************************************************************** IMPORTANT: This email and any attachments may contain legally privileged, confidential or private information, and may be protected by copyright. You may only use or disclose this information if you are the intended recipient(s) and if you use it in an authorised way. No other person is allowed to use, review, alter, transmit, disclose, distribute, print or copy this email and any attachments without appropriate authorisation. If you are not the intended recipient(s) and the email was sent to you by mistake, please notify the sender immediately by return email or phone, destroy any hardcopies of this email and any attachments and delete it from your system. Any legal privilege and confidentiality attached to this email is not waived or destroyed by that mistake. The Department of Education carries out monitoring, scanning and blocking of emails and attachments sent from or to addresses within the Department of Education for the purposes of operating, protecting, maintaining and ensuring appropriate use of its computer network. It is your responsibility to ensure that this email does not contain and is not affected by computer viruses, defects or interference by third parties or replication problems (including incompatibility with your computer system). The Department of Education does not accept any responsibility for any loss or damage that may result from reliance on, or the use of, any information contained in the email and any attachments. *************************************************************************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: