From richard.wordingham at ntlworld.com Tue Mar 1 03:06:46 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Tue, 1 Mar 2022 09:06:46 +0000 Subject: Bidi and Empty Parentheses In-Reply-To: <83pmn6l4ur.fsf@gnu.org> References: <20220228210942.0271a9e2@JRWUBU2> <3655c0eb-b355-4762-6a9f-11687562b525@sonic.net> <83pmn6l4ur.fsf@gnu.org> Message-ID: <20220301090646.023fd229@JRWUBU2> On Tue, 01 Mar 2022 05:38:20 +0200 Eli Zaretskii via Unicode wrote: > > Date: Mon, 28 Feb 2022 17:41:03 -0800 > > Cc: unicode at corp.unicode.org > > From: Ken Whistler via Unicode > > > > Richard, > > > > "x()y" > > Maybe there's a misunderstanding. Richard said "in a right-to-left > embedding", so I tried > > RLE x ( ) y PDF > > and got "y()x" on display. Thank you for the confirmation, Ken. LibreOffice (on Ubuntu) and Word and Notepad (Windows 10) agree with you on Eli's example. Eli, Emacs (e.g. 28.0.91) gets it wrong. Bug report tonight if I can get Emacs to work with Claws. Richard. From aprilop at freenet.de Tue Mar 1 08:11:46 2022 From: aprilop at freenet.de (Andreas Prilop) Date: Tue, 01 Mar 2022 14:11:46 +0000 Subject: Bidi and Empty Parentheses In-Reply-To: <83pmn6l4ur.fsf@gnu.org> References: <20220228210942.0271a9e2@JRWUBU2> <3655c0eb-b355-4762-6a9f-11687562b525@sonic.net> <83pmn6l4ur.fsf@gnu.org> Message-ID: On 1 March 2022, Eli Zaretskii wrote: > I tried > RLE x ( ) y PDF BTW: You should not use the embedding characters any longer. After 15 years of Bidirectional Algorithm, they finally discovered that the embedding characters (as well as their HTML and CSS equivalents) do not work as desired. Example: Egy kettő három

1 2 3 0x

1 2 3 0x

⁧1 2 3⁩ 0x

‫1 2 3‬ 0x

-------------- next part -------------- An HTML attachment was scrubbed... URL: From eliz at gnu.org Tue Mar 1 08:47:12 2022 From: eliz at gnu.org (Eli Zaretskii) Date: Tue, 01 Mar 2022 16:47:12 +0200 Subject: Bidi and Empty Parentheses In-Reply-To: (message from Andreas Prilop via Unicode on Tue, 01 Mar 2022 14:11:46 +0000) References: <20220228210942.0271a9e2@JRWUBU2> <3655c0eb-b355-4762-6a9f-11687562b525@sonic.net> <83pmn6l4ur.fsf@gnu.org> Message-ID: <83a6e9logf.fsf@gnu.org> > Date: Tue, 01 Mar 2022 14:11:46 +0000 > From: Andreas Prilop via Unicode > > You should not use the embedding characters any longer. > After 15 years of Bidirectional Algorithm, they finally discovered that the embedding > characters (as well as their HTML and CSS equivalents) do not work as desired. > > Example: > > > Egy kettő három >

1 2 3 0x

>

1 2 3 0x

>

⁧1 2 3⁩ 0x

>

‫1 2 3‬ 0x

What doesn't work here as desired, may I ask? From aprilop at freenet.de Tue Mar 1 09:00:34 2022 From: aprilop at freenet.de (Andreas Prilop) Date: Tue, 01 Mar 2022 15:00:34 +0000 Subject: Bidi and Empty Parentheses In-Reply-To: References: <20220228210942.0271a9e2@JRWUBU2> <3655c0eb-b355-4762-6a9f-11687562b525@sonic.net> <83pmn6l4ur.fsf@gnu.org> Message-ID: <7D287EF1-9986-484D-98FE-130DFC096EBC@freenet.de> On 1 March 2022, I wrote something else than what is shown at https://corp.unicode.org/pipermail/unicode/2022-March/009989.html I think I have to use hexadecimal representations to overcome BS Pipermail. Egy kettő három

1 2 3 0x

1 2 3 0x

⁧1 2 3⁩ 0x

‫1 2 3‬ 0x

From abrahamgross at disroot.org Tue Mar 1 09:11:55 2022 From: abrahamgross at disroot.org (ag disroot) Date: Tue, 1 Mar 2022 15:11:55 +0000 (UTC) Subject: Bidi and Empty Parentheses In-Reply-To: <7D287EF1-9986-484D-98FE-130DFC096EBC@freenet.de> References: <20220228210942.0271a9e2@JRWUBU2> <3655c0eb-b355-4762-6a9f-11687562b525@sonic.net> <83pmn6l4ur.fsf@gnu.org> <7D287EF1-9986-484D-98FE-130DFC096EBC@freenet.de> Message-ID: <00007e26-e4e9-4c78-a3fb-48d51e441039@disroot.org> Hebrew and arabic writers have to deal with these shenanigans on a daily basis? From aprilop at freenet.de Tue Mar 1 09:13:00 2022 From: aprilop at freenet.de (Andreas Prilop) Date: Tue, 01 Mar 2022 15:13:00 +0000 Subject: Bidi and Empty Parentheses In-Reply-To: <83a6e9logf.fsf@gnu.org> References: <20220228210942.0271a9e2@JRWUBU2> <3655c0eb-b355-4762-6a9f-11687562b525@sonic.net> <83pmn6l4ur.fsf@gnu.org> <83a6e9logf.fsf@gnu.org> Message-ID: <8AB07ACC-1DE8-41B8-BE83-AFD3A63FFF1A@freenet.de> On 1 March 2022, Eli Zaretskii wrote: > What doesn't work here as desired, may I ask? I don?t like that

1 2 3 0x

is displayed as 0 3 2 1x From eliz at gnu.org Tue Mar 1 09:26:41 2022 From: eliz at gnu.org (Eli Zaretskii) Date: Tue, 01 Mar 2022 17:26:41 +0200 Subject: Bidi and Empty Parentheses In-Reply-To: <8AB07ACC-1DE8-41B8-BE83-AFD3A63FFF1A@freenet.de> (message from Andreas Prilop via Unicode on Tue, 01 Mar 2022 15:13:00 +0000) References: <20220228210942.0271a9e2@JRWUBU2> <3655c0eb-b355-4762-6a9f-11687562b525@sonic.net> <83pmn6l4ur.fsf@gnu.org> <83a6e9logf.fsf@gnu.org> <8AB07ACC-1DE8-41B8-BE83-AFD3A63FFF1A@freenet.de> Message-ID: <837d9dlmmm.fsf@gnu.org> > Date: Tue, 01 Mar 2022 15:13:00 +0000 > X-Spam-Status: No, score=0.4 required=7.5 tests=DKIM_SIGNED,DKIM_VALID, > DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW, > RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,RCVD_IN_VALIDITY_RPBL, > SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no > autolearn_force=no version=3.4.2 > From: Andreas Prilop via Unicode > > On 1 March 2022, Eli Zaretskii wrote: > > > What doesn't work here as desired, may I ask? > > I don?t like that > > >

1 2 3 0x

> > is displayed as > 0 3 2 1x Didn't you ask for it? From doug at ewellic.org Tue Mar 1 14:06:28 2022 From: doug at ewellic.org (Doug Ewell) Date: Tue, 1 Mar 2022 13:06:28 -0700 Subject: Bidi and Empty Parentheses Message-ID: <019e01d82da7$d69f44a0$83ddcde0$@ewellic.org> Andreas Prilop wrote: > You should not use the embedding characters any longer. > After 15 years of Bidirectional Algorithm, they finally discovered > that the embedding characters (as well as their HTML and CSS > equivalents) do not work as desired. This puzzles me. Character encoding is engineering, not natural science. If some mechanism is defined incompletely or erroneously, the definition can be corrected. If some font or rendering engine doesn't handle a mechanism correctly, it can be updated. Very little of this falls into the category of "it worked for 15 years, or so we thought, but we've discovered a case where it doesn't work, so now we have to abandon the whole thing." This reminds me of the discussion two weeks ago about Arabic presentation forms, in which it was explained (again) that font and/or rendering engine inadequacies were considered justification for using these non-preferred forms instead of real Arabic letters. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org From richard.wordingham at ntlworld.com Tue Mar 1 17:14:15 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Tue, 1 Mar 2022 23:14:15 +0000 Subject: Bidi and Empty Parentheses In-Reply-To: <019e01d82da7$d69f44a0$83ddcde0$@ewellic.org> References: <019e01d82da7$d69f44a0$83ddcde0$@ewellic.org> Message-ID: <20220301231415.3ace88a4@JRWUBU2> On Tue, 1 Mar 2022 13:06:28 -0700 Doug Ewell via Unicode wrote: > Andreas Prilop wrote: > > > You should not use the embedding characters any longer. > > After 15 years of Bidirectional Algorithm, they finally discovered > > that the embedding characters (as well as their HTML and CSS > > equivalents) do not work as desired. > > This puzzles me. Character encoding is engineering, not natural > science. > > If some mechanism is defined incompletely or erroneously, the > definition can be corrected. If some font or rendering engine doesn't > handle a mechanism correctly, it can be updated. This is not certain when there is accumulated text that relies on the current behaviour. > Very little of this falls into the category of "it worked for 15 > years, or so we thought, but we've discovered a case where it doesn't > work, so now we have to abandon the whole thing." The actual assertion appears to be that the LRE and RLE mechanisms should be replaced by LRI and RLI, and in this case, it has been proposed that the HTML for the former should now be interpreted as HTML for the latter. I struggle to see the equivalence, partly because UAX #9 is still practically unintelligible. In the example given by Eli, namely "RLE x ( ) y PDF", the point of the RLE is to reliable force the paragraph direction to be right-to-left. This is for a test case; the query was prompted by an example where the only non-default higher level protocol applied to the definition of a paragraph, and a previous line caused the line of interest to have right-to-left direction. > This reminds me of the discussion two weeks ago about Arabic > presentation forms, in which it was explained (again) that font > and/or rendering engine inadequacies were considered justification > for using these non-preferred forms instead of real Arabic letters. The 'discussion' was a defence of using them instead of attaching an image. Richard. From richard.wordingham at ntlworld.com Tue Mar 1 23:44:55 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 2 Mar 2022 05:44:55 +0000 Subject: Bidi and Empty Parentheses In-Reply-To: <20220301090646.023fd229@JRWUBU2> References: <20220228210942.0271a9e2@JRWUBU2> <3655c0eb-b355-4762-6a9f-11687562b525@sonic.net> <83pmn6l4ur.fsf@gnu.org> <20220301090646.023fd229@JRWUBU2> Message-ID: <20220302054455.1ea3e8d9@JRWUBU2> On Tue, 1 Mar 2022 09:06:46 +0000 Richard Wordingham via Unicode wrote: > Eli, Emacs (e.g. 28.0.91) gets it wrong. Bug report tonight if I can > get Emacs to work with Claws. Submitted as Bug #54219, with report at https://lists.gnu.org/archive/html/bug-gnu-emacs/2022-03/msg00050.html. Cut and paste got the bug report from Emacs to Claws without any problem, though I didn't get the best out of the bug reporting mechanism until the third attempt. Richard. From eliz at gnu.org Thu Mar 3 02:49:08 2022 From: eliz at gnu.org (Eli Zaretskii) Date: Thu, 03 Mar 2022 10:49:08 +0200 Subject: Avoiding Source Code Spoofing In-Reply-To: (message from announcements via announcements on Wed, 2 Mar 2022 14:51:44 -0800) References: Message-ID: <83ilsvju9n.fsf@gnu.org> > Date: Wed, 2 Mar 2022 14:51:44 -0800 > From: announcements via announcements > Cc: announcements > > Unicode has convened a group of experts in programming languages, > tooling, and security to provide guidance and recommendations on how > to better handle international text in source code, as well as > providing code to help implementations. There was no address or place, neither in this announcement nor in the report to which it pointed, regarding where to send any comments on the issues raised by them, so I'm posting them here; apologies if that is inappropriate. First, I think the report fails to distinguish between legitimate use of RTL characters and controls, just because the program code has strings and/or comments with RTL characters; and the malicious use, where the intent is to spoof and mislead the recipients of the code. Such a distinction is important, because use of bidi controls that is legitimate in the former case is highly suspicious in the latter. For example, any source code where the inherent directionality of a strong directional character was overridden, or where a weak/neutral character has an embedding level that's too high, should be suspected as potentially malicious. Second, I don't see in the Proposed Plan any activity to collect input from users and implementors of compilers, linters, and editors. Without collecting such input, I see no way that the work group will appreciate the real-life problems and issues that the developers and users of these tools are facing, and that could easily lead to recommendations that are hard or impossible to implement at least in some of these tools, and/or which could be disconnected from the real problems and practices. For example, the idea of rendering bidi formatting control as "chits" will not solve the reordering issue in Emacs, where bidi reordering is performed _before_ the actual glyphs to present characters on the glass are fully known. More generally, editors differ significantly in how they implement various features that support editing of program source, such as syntax highlighting and on-the-fly analysis of the source tokens; the recommendations must take these into considerations to be useful. Finally, I'm sorry to say, but the report is strongly biased in that it focuses almost entirely on the issues caused by visual reordering of bidirectional text and the bidi formatting controls in particular. While it does mention other issues that yield confusing program code, those few references read more as a lip service than anything else. OTOH, there's no real attempt to describe the legitimate needs of program source code intended for RTL languages and scripts, and without such description, with only the problematic (let alone malicious) use of bidi characters discussed in this and many referenced documents, which is exacerbated by the fact that many people don't really understand the UBA and the needs of RTL scripts, this and the future documents could lead to lopsided conclusions, like "let's disallow those problematic characters from program source code". This isn't just theory: some compilers, evidently alarmed by the brouhaha around these issues, actually went ahead and started flagging the use of some of these characters in program source code as errors! While such ridiculous (IMO) "solutions" in this or that tool could be dismissed as folly on the part of their developers, a document written and sanctioned by the Unicode Consortium which leads to similar conclusions would be a disastrous development, which will significantly hamper development of bidi-aware program development tools and disadvantage their users who work in RTL language environment. I hope this is not how this (very important, IMO) initiative will end. From eliz at gnu.org Thu Mar 3 08:00:07 2022 From: eliz at gnu.org (Eli Zaretskii) Date: Thu, 03 Mar 2022 16:00:07 +0200 Subject: Bidi and Empty Parentheses In-Reply-To: <20220302054455.1ea3e8d9@JRWUBU2> (message from Richard Wordingham via Unicode on Wed, 2 Mar 2022 05:44:55 +0000) References: <20220228210942.0271a9e2@JRWUBU2> <3655c0eb-b355-4762-6a9f-11687562b525@sonic.net> <83pmn6l4ur.fsf@gnu.org> <20220301090646.023fd229@JRWUBU2> <20220302054455.1ea3e8d9@JRWUBU2> Message-ID: <83czj3jfvc.fsf@gnu.org> > Date: Wed, 2 Mar 2022 05:44:55 +0000 > From: Richard Wordingham via Unicode > > On Tue, 1 Mar 2022 09:06:46 +0000 > Richard Wordingham via Unicode wrote: > > > Eli, Emacs (e.g. 28.0.91) gets it wrong. Bug report tonight if I can > > get Emacs to work with Claws. > > Submitted as Bug #54219, with report at > https://lists.gnu.org/archive/html/bug-gnu-emacs/2022-03/msg00050.html. Now fixed for the next Emacs release. From manishsmail at gmail.com Thu Mar 3 03:02:16 2022 From: manishsmail at gmail.com (Manish Goregaokar) Date: Thu, 3 Mar 2022 01:02:16 -0800 Subject: Avoiding Source Code Spoofing In-Reply-To: <83ilsvju9n.fsf@gnu.org> References: <83ilsvju9n.fsf@gnu.org> Message-ID: Hi, I think noting a couple things will resolve your concerns: - The group is not primarily structured around the thrust of any particular report. The report you refer to is what started a wider discussion leading to the formation of this group, but this group is not directly attempting to address problems posed in that report. If you look closely, the framing of the problem in this announcement is radically different from that in that report. - The second to last paragraph of the post should make it clear that disadvantaging users of these scripts is not on the table. - There is significant work being done already to consult implementers of compilers and tooling. As for consulting users; I'm sure that will be the case, it's just still pretty early in the process. Thanks, -Manish Goregaokar On Thu, Mar 3, 2022 at 12:53 AM Eli Zaretskii via Unicode < unicode at corp.unicode.org> wrote: > > Date: Wed, 2 Mar 2022 14:51:44 -0800 > > From: announcements via announcements > > Cc: announcements > > > > Unicode has convened a group of experts in programming languages, > > tooling, and security to provide guidance and recommendations on how > > to better handle international text in source code, as well as > > providing code to help implementations. > > There was no address or place, neither in this announcement nor in the > report to which it pointed, regarding where to send any comments on > the issues raised by them, so I'm posting them here; apologies if that > is inappropriate. > > First, I think the report fails to distinguish between legitimate use > of RTL characters and controls, just because the program code has > strings and/or comments with RTL characters; and the malicious use, > where the intent is to spoof and mislead the recipients of the code. > Such a distinction is important, because use of bidi controls that is > legitimate in the former case is highly suspicious in the latter. For > example, any source code where the inherent directionality of a strong > directional character was overridden, or where a weak/neutral > character has an embedding level that's too high, should be suspected > as potentially malicious. > > Second, I don't see in the Proposed Plan any activity to collect input > from users and implementors of compilers, linters, and editors. > Without collecting such input, I see no way that the work group will > appreciate the real-life problems and issues that the developers and > users of these tools are facing, and that could easily lead to > recommendations that are hard or impossible to implement at least in > some of these tools, and/or which could be disconnected from the real > problems and practices. For example, the idea of rendering bidi > formatting control as "chits" will not solve the reordering issue in > Emacs, where bidi reordering is performed _before_ the actual glyphs > to present characters on the glass are fully known. More generally, > editors differ significantly in how they implement various features > that support editing of program source, such as syntax highlighting > and on-the-fly analysis of the source tokens; the recommendations must > take these into considerations to be useful. > > Finally, I'm sorry to say, but the report is strongly biased in that > it focuses almost entirely on the issues caused by visual reordering > of bidirectional text and the bidi formatting controls in particular. > While it does mention other issues that yield confusing program code, > those few references read more as a lip service than anything else. > OTOH, there's no real attempt to describe the legitimate needs of > program source code intended for RTL languages and scripts, and > without such description, with only the problematic (let alone > malicious) use of bidi characters discussed in this and many > referenced documents, which is exacerbated by the fact that many > people don't really understand the UBA and the needs of RTL scripts, > this and the future documents could lead to lopsided conclusions, like > "let's disallow those problematic characters from program source > code". This isn't just theory: some compilers, evidently alarmed by > the brouhaha around these issues, actually went ahead and started > flagging the use of some of these characters in program source code as > errors! While such ridiculous (IMO) "solutions" in this or that tool > could be dismissed as folly on the part of their developers, a > document written and sanctioned by the Unicode Consortium which leads > to similar conclusions would be a disastrous development, which will > significantly hamper development of bidi-aware program development > tools and disadvantage their users who work in RTL language > environment. I hope this is not how this (very important, IMO) > initiative will end. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom at honermann.net Thu Mar 3 15:24:54 2022 From: tom at honermann.net (Tom Honermann) Date: Thu, 3 Mar 2022 16:24:54 -0500 Subject: Unicode expert group for international text in source code Message-ID: Hi, Mark and Markus. I received the announcement about the new Unicode group that will be investigating problems and solutions relating to international text in programming languages. I currently chair the ISO WG21 SG16 study group focused on C++ Unicode and text programming. A subset of WG21 recently agreed to pursue addressing "Trojan Source"-like problems within WG21; probably in the form of a technical report. This was in response to a paper that was submitted to WG21; P2528R0 (C++ Identifier Security using Unicode Standard Annex 39) . (The paper needs some significant work before it will be ready for further review within WG21; it is a difficult read at present). Thus, some subset of WG21 will be working on guidance and recommendations for C++ implementors and I would like to ensure that such efforts are aligned with the new Unicode group. Please let me know how I might apply to join the group and what requirements or expectations might exist. Tom. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom at honermann.net Thu Mar 3 15:53:00 2022 From: tom at honermann.net (Tom Honermann) Date: Thu, 3 Mar 2022 16:53:00 -0500 Subject: Unicode expert group for international text in source code In-Reply-To: References: Message-ID: <2e504472-de8a-3f87-fa2e-f85df7113d76@honermann.net> Sorry for the noise, please disregard. This was not intended to be sent to the mailing list; I was foiled (again) by mailing list sender rewriting rules. Tom. On 3/3/22 4:24 PM, Tom Honermann via Unicode wrote: > > Hi, Mark and Markus. > > I received the announcement about the new Unicode group that will be > investigating problems and solutions relating to international text in > programming languages. > > I currently chair the ISO WG21 SG16 study group focused on C++ Unicode > and text programming. A subset of WG21 recently agreed to pursue > addressing "Trojan Source"-like problems within WG21; probably in the > form of a technical report. This was in response to a paper that was > submitted to WG21; P2528R0 (C++ Identifier Security using Unicode > Standard Annex 39) . (The paper needs some > significant work before it will be ready for further review within > WG21; it is a difficult read at present). Thus, some subset of WG21 > will be working on guidance and recommendations for C++ implementors > and I would like to ensure that such efforts are aligned with the new > Unicode group. > > Please let me know how I might apply to join the group and what > requirements or expectations might exist. > > Tom. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Fri Mar 4 11:26:37 2022 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Fri, 4 Mar 2022 17:26:37 +0000 (GMT) Subject: SNOMED CT clinical terminology and localization into many languages in Europe Message-ID: <1eff4efb.7422.17f55f81a4a.Webtop.102@btinternet.com> SNOMED CT clinical terminology and localization into many languages in Europe https://www.snomed.org/news-and-events/articles/EU-drives-standardized-terminology-funding-program As some of the languages of Europe include use of some characters from beyond U+00FF the existence of The Unicode Standard will help facilitate the implementation of this interoperability in this very important application of information technology in healthcare. William Overington Friday 4 March 2022 From stas624-uni at yahoo.com Sun Mar 6 20:38:39 2022 From: stas624-uni at yahoo.com (stas) Date: Mon, 7 Mar 2022 02:38:39 +0000 (UTC) Subject: Tengwar on a general purpose translation site References: <1860455796.867798.1646620719014.ref@mail.yahoo.com> Message-ID: <1860455796.867798.1646620719014@mail.yahoo.com> At yandex: https://translate.yandex.com/?lang=en-sjn&text=night I bet Tengwar is used much more than some other scripts already encoded in SMP. From mark at kli.org Sun Mar 6 22:01:51 2022 From: mark at kli.org (Mark E. Shoulson) Date: Sun, 6 Mar 2022 23:01:51 -0500 Subject: Tengwar on a general purpose translation site In-Reply-To: <1860455796.867798.1646620719014@mail.yahoo.com> References: <1860455796.867798.1646620719014.ref@mail.yahoo.com> <1860455796.867798.1646620719014@mail.yahoo.com> Message-ID: <5634e565-97ff-9044-95a7-650aaa52e40d@shoulson.com> pIqaD (Klingon) used to be on Bing's translation site, though I don't see it there now, and the same has been said of it.? But both scripts have other issues to deal with. ~mark On 3/6/22 21:38, stas via Unicode wrote: > At yandex: https://translate.yandex.com/?lang=en-sjn&text=night > > I bet Tengwar is used much more than some other scripts already encoded in SMP. From doug at ewellic.org Mon Mar 7 11:11:37 2022 From: doug at ewellic.org (Doug Ewell) Date: Mon, 7 Mar 2022 10:11:37 -0700 Subject: Re Tengwar on a general purpose translation site Message-ID: <000201d83246$680e5960$382b0c20$@ewellic.org> stas wrote: > At yandex: https://translate.yandex.com/?lang=en-sjn&text=night > > I bet Tengwar is used much more than some other scripts already > encoded in SMP. This site uses the exact same Sindarin word to translate the English word "set" in these phrases: a set of dishes set the table set in his ways leading me to believe it's merely substituting words from a dictionary, and not performing actual translation. An updated proposal for Tengwar that claims it is being used for more than just decoration (a traditional argument against encoding Klingon pIqaD) may need to be supported with better language tools than this. At least Yandex had the decency to use the real ISO 639-3 code for Sindarin (sjn) instead of making that up. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org From stas624-uni at yahoo.com Mon Mar 7 11:38:31 2022 From: stas624-uni at yahoo.com (stas) Date: Mon, 7 Mar 2022 17:38:31 +0000 (UTC) Subject: Re Tengwar on a general purpose translation site In-Reply-To: <000201d83246$680e5960$382b0c20$@ewellic.org> References: <000201d83246$680e5960$382b0c20$@ewellic.org> Message-ID: <1084485675.1117773.1646674711669@mail.yahoo.com> On Monday, March 7, 2022, 11:21:43 PM GMT+6, Doug Ewell via Unicode wrote: > An updated proposal for Tengwar that claims it is being used for more than just decoration (a traditional argument against encoding Klingon pIqaD) may need to be supported with better language tools than this. I disagree. The mere need to display Tengwar should be enough, even if it is "just" for a dictionary. I don't understand such a bias against conscripts. Historic scripts need much less justification. I think one of Albanian scripts is encoded even though there exists only one document in it (I may be wrong, that's how I remember it). -------------- next part -------------- An HTML attachment was scrubbed... URL: From junicode at jcbradfield.org Mon Mar 7 13:04:11 2022 From: junicode at jcbradfield.org (Julian Bradfield) Date: Mon, 7 Mar 2022 19:04:11 +0000 (GMT) Subject: Re Tengwar on a general purpose translation site References: <000201d83246$680e5960$382b0c20$@ewellic.org> <1084485675.1117773.1646674711669@mail.yahoo.com> Message-ID: On 2022-03-07, stas via Unicode wrote: > On Monday, March 7, 2022, 11:21:43 PM GMT+6, Doug Ewell via Unicode wrote: > >> An updated proposal for Tengwar that claims it is being used for more than just decoration (a traditional argument against encoding Klingon pIqaD) may need to be supported with better language tools than this. > > I disagree. The mere need to display Tengwar should be enough, even if it is "just" for a dictionary. It doesn't matter what you think about this. What matters is what the Tolkien Estate thinks about copyright, and they don't want tengwar in Unicode. From jameskass at code2001.com Mon Mar 7 14:43:28 2022 From: jameskass at code2001.com (James Kass) Date: Mon, 7 Mar 2022 20:43:28 +0000 Subject: Tengwar on a general purpose translation site In-Reply-To: References: <000201d83246$680e5960$382b0c20$@ewellic.org> <1084485675.1117773.1646674711669@mail.yahoo.com> Message-ID: <265ecba2-54e9-0b6a-b262-0466ae04521f@code2001.com> On 2022-03-07 7:04 PM, Julian Bradfield via Unicode wrote: > It doesn't matter what you think about this. What matters is what the > Tolkien Estate thinks about copyright, and they don't want tengwar in > Unicode. What?s the estate?s stance on Tengwar in the CSUR? I?ve had Tengwar in my fonts for over twenty years, and nobody?s ever said ?boo?.? (Same with Klingon.) It?s hard to imagine anyone inventing a writing system with the idea that nobody should use it. From kenwhistler at sonic.net Mon Mar 7 14:59:47 2022 From: kenwhistler at sonic.net (Ken Whistler) Date: Mon, 7 Mar 2022 12:59:47 -0800 Subject: Tengwar on a general purpose translation site In-Reply-To: <265ecba2-54e9-0b6a-b262-0466ae04521f@code2001.com> References: <000201d83246$680e5960$382b0c20$@ewellic.org> <1084485675.1117773.1646674711669@mail.yahoo.com> <265ecba2-54e9-0b6a-b262-0466ae04521f@code2001.com> Message-ID: James, On 3/7/2022 12:43 PM, James Kass via Unicode wrote: > It?s hard to imagine anyone inventing a writing system with the idea > that nobody should use it. But it's not hard to imagine someone doing so with the expectation that its use would involve licensing and licensing fees. --Ken From junicode at jcbradfield.org Mon Mar 7 15:15:24 2022 From: junicode at jcbradfield.org (Julian Bradfield) Date: Mon, 7 Mar 2022 21:15:24 +0000 (GMT) Subject: Tengwar on a general purpose translation site References: <000201d83246$680e5960$382b0c20$@ewellic.org> <1084485675.1117773.1646674711669@mail.yahoo.com> <265ecba2-54e9-0b6a-b262-0466ae04521f@code2001.com> Message-ID: On 2022-03-07, James Kass via Unicode wrote: > What?s the estate?s stance on Tengwar in the CSUR? You would have to ask them, and explain what the CSUR is. My guess would be that the answer would be neutral, as CSUR is entirely unofficial and has no legal baggage associated with it. In practice, the Tolkien Estate does not interfere with fan activities provided (a) they're not commercialized, and (b) they do not bring Tolkien's name into disrepute (in the opinion of the Estate). Whether they even go so far as to admit this, I don't remember, but by observation it's been their practice for the last fifty years. > It?s hard to imagine anyone inventing a writing system with the idea > that nobody should use it. I think you need more imagination. The only person who needed to write tengwar was JRRT! But that's getting off-topic. Julian. From junicode at jcbradfield.org Mon Mar 7 15:23:15 2022 From: junicode at jcbradfield.org (Julian Bradfield) Date: Mon, 7 Mar 2022 21:23:15 +0000 Subject: Re Tengwar on a general purpose translation site In-Reply-To: <1605103681.1174076.1646683014775@mail.yahoo.com> References: <000201d83246$680e5960$382b0c20$@ewellic.org> <1084485675.1117773.1646674711669@mail.yahoo.com> <1605103681.1174076.1646683014775@mail.yahoo.com> Message-ID: <25126.30659.768811.677024@high.jcbradfield.org> >On Tuesday, March 8, 2022, 01:14:25 AM GMT+6, Julian Bradfield via Unicode wrote: > >> It doesn't matter what you think about this. What matters is what the >Tolkien Estate thinks about copyright, and they don't want tengwar in >Unicode. >Is there any proof online that Tolkien >Estate doesn't want Tengwar in Unicode? Legal matters are not decided by articles online. I have a letter from the solicitor for the Tolkien Estate refusing permission for us to submit an update of Michael's proposal to Unicode. Or to be precise, refusing to give the intellectual property waivers that are a precondition for the Unicode Consortium to consider such a proposal. From jameskass at code2001.com Mon Mar 7 15:47:17 2022 From: jameskass at code2001.com (James Kass) Date: Mon, 7 Mar 2022 21:47:17 +0000 Subject: Tengwar on a general purpose translation site In-Reply-To: References: <000201d83246$680e5960$382b0c20$@ewellic.org> <1084485675.1117773.1646674711669@mail.yahoo.com> <265ecba2-54e9-0b6a-b262-0466ae04521f@code2001.com> Message-ID: On 2022-03-07 8:59 PM, Ken Whistler via Unicode wrote: > But it's not hard to imagine someone doing so with the expectation > that its use would involve licensing and licensing fees. https://www.etsy.com/market/tengwar So all of those vendors out there selling Tengwar-related stuff are paying licensing fees to the estate?? Good to know... From jameskass at code2001.com Mon Mar 7 15:49:05 2022 From: jameskass at code2001.com (James Kass) Date: Mon, 7 Mar 2022 21:49:05 +0000 Subject: Re Tengwar on a general purpose translation site In-Reply-To: <25126.30659.768811.677024@high.jcbradfield.org> References: <000201d83246$680e5960$382b0c20$@ewellic.org> <1084485675.1117773.1646674711669@mail.yahoo.com> <1605103681.1174076.1646683014775@mail.yahoo.com> <25126.30659.768811.677024@high.jcbradfield.org> Message-ID: On 2022-03-07 9:23 PM, Julian Bradfield via Unicode wrote: > I have a letter from the solicitor for the Tolkien Estate refusing > permission for us to submit an update of Michael's proposal to > Unicode. Or to be precise, refusing to give the intellectual property > waivers that are a precondition for the Unicode Consortium to consider > such a proposal. Seems likely that the estate doesn't really care whether Tengwar gets into Unicode.? They just refuse to waive their rights, if any. From mark at kli.org Tue Mar 8 07:48:12 2022 From: mark at kli.org (Mark E. Shoulson) Date: Tue, 8 Mar 2022 08:48:12 -0500 Subject: Re Tengwar on a general purpose translation site In-Reply-To: <000201d83246$680e5960$382b0c20$@ewellic.org> References: <000201d83246$680e5960$382b0c20$@ewellic.org> Message-ID: <3e04f431-033e-6645-b5b6-3de0eb8ba9fd@shoulson.com> On 3/7/22 12:11, Doug Ewell via Unicode wrote: > An updated proposal for Tengwar that claims it is being used for more than just decoration (a traditional argument against encoding Klingon pIqaD) may need to be supported with better language tools than this. At least at the start, tengwar had _more_ evidence that it wasn't just decoration than pIqaD had.? Having been around a lot longer, it was already being used by hardcore Tolkien fans here and there, and I think not just to transcribe English (as Tolkien himself did, e.g. on the LotR title pages).? People were making their fumbling attempts to figure out Elvish languages with whatever official information was available at the time and writing stuff. pIqaD has never really been used as merely another way to write English (it lacks the sounds), though.? In that sense and in those usages, tengwar is arguably more "mere decoration."? And even that is disputable. ~mark From jameskass at code2001.com Wed Mar 9 12:38:43 2022 From: jameskass at code2001.com (James Kass) Date: Wed, 9 Mar 2022 18:38:43 +0000 Subject: Tengwar on a general purpose translation site In-Reply-To: <3e04f431-033e-6645-b5b6-3de0eb8ba9fd@shoulson.com> References: <000201d83246$680e5960$382b0c20$@ewellic.org> <3e04f431-033e-6645-b5b6-3de0eb8ba9fd@shoulson.com> Message-ID: <087e8148-b867-fcb5-0354-699498756fc2@code2001.com> Suppose a proposal works around any IP concerns, real or imaginary, by using generic character names along the lines of CJK ideographs. Such as: U+xxx01? FICTIONAL CONSCRIPT CHARACTER-XXX01 U+xxx02? FICTIONAL CONSCRIPT CHARACTER-XXX02 and so forth. The charts covering the ranges could be blank with a footnote explaining that the lack of glyphs is due to IP concerns.? The proposal could refer to earlier proposals for usage examples and the proposed range need not mention any author's name or copyrighted brands. Would such a proposal have any chance of moving forward towards acceptance? If no, then it may be pointless to pursue standardization of fictional ConScripts.? If yes, then a proposer seeking an IP release could submit examples of what the proposed range would look like with or without the release to the IP holder, telling the IP holder that the writing system in question will likely be encoded one way or another.? The IP holder could then decide whether it would be better to grant or withhold the release. From kenwhistler at sonic.net Wed Mar 9 12:57:26 2022 From: kenwhistler at sonic.net (Ken Whistler) Date: Wed, 9 Mar 2022 10:57:26 -0800 Subject: Tengwar on a general purpose translation site In-Reply-To: <087e8148-b867-fcb5-0354-699498756fc2@code2001.com> References: <000201d83246$680e5960$382b0c20$@ewellic.org> <3e04f431-033e-6645-b5b6-3de0eb8ba9fd@shoulson.com> <087e8148-b867-fcb5-0354-699498756fc2@code2001.com> Message-ID: On 3/9/2022 10:38 AM, James Kass via Unicode wrote: > > Suppose a proposal works around any IP concerns, real or imaginary, by > using generic character names along the lines of CJK ideographs. Such as: > > U+xxx01? FICTIONAL CONSCRIPT CHARACTER-XXX01 > U+xxx02? FICTIONAL CONSCRIPT CHARACTER-XXX02 > and so forth. > > The charts covering the ranges could be blank with a footnote > explaining that the lack of glyphs is due to IP concerns.? The > proposal could refer to earlier proposals for usage examples and the > proposed range need not mention any author's name or copyrighted brands. > > Would such a proposal have any chance of moving forward towards > acceptance? Well, insofar as this is attempt to "encode" characters without providing reference glyphs or names or any meaningful semantics, it isn't much different from just using: U+F0001 U+F0002 ... I don't see the UTC going for this kind of pseudo-private-use concept. The whole point of *standardizing* characters is to spell out precisely what they are so that interchange is reliable. --Ken > > > From wjgo_10009 at btinternet.com Mon Mar 7 14:13:38 2022 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Mon, 7 Mar 2022 20:13:38 +0000 (GMT) Subject: Re Tengwar on a general purpose translation site In-Reply-To: References: <000201d83246$680e5960$382b0c20$@ewellic.org> <1084485675.1117773.1646674711669@mail.yahoo.com> Message-ID: <63af3bec.b49a.17f66041836.Webtop.102@btinternet.com> Clearly, views on the granting of permissions for the encoding of original glyphs as characters into The Unicode Standard vary. I have devised some original glyphs that I have used to write some poems. The prints are quite decorative. I frame the prints using frames that are marketed as photograph frames that are delivered with my grocery order. I like the oak effect frames. Here is a link to artwork that displays one such poem. https://forum.affinity.serif.com/index.php?/topic/143812-informal-design-workshop-idea/page/3 I would be pleased and delighted if these glyphs were to become encoded into The Unicode Standard. William Overington Monday 7 March 2022 -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Wed Mar 9 13:40:10 2022 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Wed, 9 Mar 2022 19:40:10 +0000 (GMT) Subject: Tengwar on a general purpose translation site In-Reply-To: <087e8148-b867-fcb5-0354-699498756fc2@code2001.com> References: <000201d83246$680e5960$382b0c20$@ewellic.org> <3e04f431-033e-6645-b5b6-3de0eb8ba9fd@shoulson.com> <087e8148-b867-fcb5-0354-699498756fc2@code2001.com> Message-ID: <3fbfacf6.3705.17f70322d08.Webtop.102@btinternet.com> James Kass wrote: > If no, then it may be pointless to pursue standardization of fictional > ConScripts. I opine that it needs to be remembered that some authors of a novel that uses a conscript that the author has devised may be delighted for the conscript to become encoded in regular Unicode and regard it as a great honour and great recognition of the author's work, which indeed I opine that it would be. > If yes, ..., telling the IP holder that the writing system in question > will likely be encoded one way or another. I opine that that would not be a diplomatic, kind, or fair way to "tell" someone something. If permission were given for Tengwar to become encoded into regular Unicode, I opine that that would be a great way to celebrate the work and art of J. R. R. Tolkien for as long as Unicode and ISO/IEC 10646 exist. What if the Tolkien Estate were offered the inclusion of a piece of text of a significant length together with illustrations about Tengwar and the work of J. R. R. Tolkien in The Unicode Standard, in a chapter of its own? William Overington Wednesday 9 March 2022 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at kli.org Wed Mar 9 19:25:48 2022 From: mark at kli.org (Mark E. Shoulson) Date: Wed, 9 Mar 2022 20:25:48 -0500 Subject: Tengwar on a general purpose translation site In-Reply-To: References: <000201d83246$680e5960$382b0c20$@ewellic.org> <3e04f431-033e-6645-b5b6-3de0eb8ba9fd@shoulson.com> <087e8148-b867-fcb5-0354-699498756fc2@code2001.com> Message-ID: <16ac788c-712f-8598-4dcf-b4fcec03cc9d@shoulson.com> I sort of have to agree with Ken here (hey, it happens.)? What makes Unicode an encoding of *characters*, what makes U+0041 mean LATIN CAPITAL LETTER A, is that Unicode says that's what it is. In the name, yes, but the name isn't dispositive (there are misnamed characters); the rest of the standard also counts.? It's LATIN CAPITAL LETTER A, and not just UNICODE CHARACTER 0041. Unless the standard stipulates, in the actual character names or out of them, that codepoint XXX01 really corresponds to TENGWAR LETTER TINCO, then, well, it might as well just be UNICODE CHARACTER XXX01 or as Ken says, . Also, although everyone seems focussed on the glyphs, I think that really isn't the issue.? If I somehow invented an entirely new and original set of glyphs for Klingon, and somehow got them retroactively canonized and used and all that stuff (or to put it another way, if I invented or held the copyright on the glyph forms), I don't think Unicode would be any more comfortable about encoding KLINGON LETTER A or even PIQAD LETTER A than they are now.? It's not about the pictures, and Unicode doesn't encode glyphs.? Encoding Tengwar is or maybe is felt to be... doing *something* sortakinda related to IP owned by the Tolkien estate, and that's the sticking point. ~mark On 3/9/22 13:57, Ken Whistler via Unicode wrote: > > On 3/9/2022 10:38 AM, James Kass via Unicode wrote: >> >> Suppose a proposal works around any IP concerns, real or imaginary, >> by using generic character names along the lines of CJK ideographs. >> Such as: >> >> U+xxx01? FICTIONAL CONSCRIPT CHARACTER-XXX01 >> U+xxx02? FICTIONAL CONSCRIPT CHARACTER-XXX02 >> and so forth. >> >> The charts covering the ranges could be blank with a footnote >> explaining that the lack of glyphs is due to IP concerns.? The >> proposal could refer to earlier proposals for usage examples and the >> proposed range need not mention any author's name or copyrighted brands. >> >> Would such a proposal have any chance of moving forward towards >> acceptance? > > Well, insofar as this is attempt to "encode" characters without > providing reference glyphs or names or any meaningful semantics, it > isn't much different from just using: > > U+F0001 > U+F0002 > ... > > I don't see the UTC going for this kind of pseudo-private-use concept. > The whole point of *standardizing* characters is to spell out > precisely what they are so that interchange is reliable. > > --Ken > >> >> >> From sdowney at gmail.com Wed Mar 9 22:22:34 2022 From: sdowney at gmail.com (Steve Downey) Date: Wed, 9 Mar 2022 23:22:34 -0500 Subject: Tengwar on a general purpose translation site In-Reply-To: <16ac788c-712f-8598-4dcf-b4fcec03cc9d@shoulson.com> References: <000201d83246$680e5960$382b0c20$@ewellic.org> <3e04f431-033e-6645-b5b6-3de0eb8ba9fd@shoulson.com> <087e8148-b867-fcb5-0354-699498756fc2@code2001.com> <16ac788c-712f-8598-4dcf-b4fcec03cc9d@shoulson.com> Message-ID: My impression over the years, or decades, is that no one wants to get into potentially expensive IP litigation. There are sound arguments that ConLang scripts don't have trademark or copyright protections that ought to prevent them being standardized. However, without an excellent defense team, there is the risk of a ruling going the other way, causing trouble for everyone everywhere for a long time. On top of that, it's not clear that the Unicode Consortium should be encoding scripts produced by people who say they don't want them encoded. I suspect at this point we are a half generation away from these IP owners getting serious internal agitation for standardisation ("Why haven't we seen a picture of the Whole Earth"). And then the dam breaks and the opposite problems are back. On Wed, Mar 9, 2022 at 8:29 PM Mark E. Shoulson via Unicode < unicode at corp.unicode.org> wrote: > I sort of have to agree with Ken here (hey, it happens.) What makes > Unicode an encoding of *characters*, what makes U+0041 mean LATIN > CAPITAL LETTER A, is that Unicode says that's what it is. In the name, > yes, but the name isn't dispositive (there are misnamed characters); the > rest of the standard also counts. It's LATIN CAPITAL LETTER A, and not > just UNICODE CHARACTER 0041. Unless the standard stipulates, in the > actual character names or out of them, that codepoint XXX01 really > corresponds to TENGWAR LETTER TINCO, then, well, it might as well just > be UNICODE CHARACTER XXX01 or as Ken says, . > > Also, although everyone seems focussed on the glyphs, I think that > really isn't the issue. If I somehow invented an entirely new and > original set of glyphs for Klingon, and somehow got them retroactively > canonized and used and all that stuff (or to put it another way, if I > invented or held the copyright on the glyph forms), I don't think > Unicode would be any more comfortable about encoding KLINGON LETTER A or > even PIQAD LETTER A than they are now. It's not about the pictures, and > Unicode doesn't encode glyphs. Encoding Tengwar is or maybe is felt to > be... doing *something* sortakinda related to IP owned by the Tolkien > estate, and that's the sticking point. > > ~mark > > On 3/9/22 13:57, Ken Whistler via Unicode wrote: > > > > On 3/9/2022 10:38 AM, James Kass via Unicode wrote: > >> > >> Suppose a proposal works around any IP concerns, real or imaginary, > >> by using generic character names along the lines of CJK ideographs. > >> Such as: > >> > >> U+xxx01 FICTIONAL CONSCRIPT CHARACTER-XXX01 > >> U+xxx02 FICTIONAL CONSCRIPT CHARACTER-XXX02 > >> and so forth. > >> > >> The charts covering the ranges could be blank with a footnote > >> explaining that the lack of glyphs is due to IP concerns. The > >> proposal could refer to earlier proposals for usage examples and the > >> proposed range need not mention any author's name or copyrighted brands. > >> > >> Would such a proposal have any chance of moving forward towards > >> acceptance? > > > > Well, insofar as this is attempt to "encode" characters without > > providing reference glyphs or names or any meaningful semantics, it > > isn't much different from just using: > > > > U+F0001 > > U+F0002 > > ... > > > > I don't see the UTC going for this kind of pseudo-private-use concept. > > The whole point of *standardizing* characters is to spell out > > precisely what they are so that interchange is reliable. > > > > --Ken > > > >> > >> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtauber at jtauber.com Thu Mar 10 04:55:38 2022 From: jtauber at jtauber.com (James Tauber) Date: Thu, 10 Mar 2022 18:55:38 +0800 Subject: Tengwar on a general purpose translation site In-Reply-To: <16ac788c-712f-8598-4dcf-b4fcec03cc9d@shoulson.com> References: <000201d83246$680e5960$382b0c20$@ewellic.org> <3e04f431-033e-6645-b5b6-3de0eb8ba9fd@shoulson.com> <087e8148-b867-fcb5-0354-699498756fc2@code2001.com> <16ac788c-712f-8598-4dcf-b4fcec03cc9d@shoulson.com> Message-ID: The intellectual property rights (to the extent they may be enforceable or at least claimed) would be in the glyph and in the name, right? But the Tengwar has a canonical enumeration. So TENGWAR LETTER 1 is meaningful in the context of LOTR Appendix E without reference to the glyph or name 'tinco' so it's not quite the same as just saying UNICODE CHARACTER XXX01. James On Thu, Mar 10, 2022 at 9:31 AM Mark E. Shoulson via Unicode wrote: > > I sort of have to agree with Ken here (hey, it happens.) What makes > Unicode an encoding of *characters*, what makes U+0041 mean LATIN > CAPITAL LETTER A, is that Unicode says that's what it is. In the name, > yes, but the name isn't dispositive (there are misnamed characters); the > rest of the standard also counts. It's LATIN CAPITAL LETTER A, and not > just UNICODE CHARACTER 0041. Unless the standard stipulates, in the > actual character names or out of them, that codepoint XXX01 really > corresponds to TENGWAR LETTER TINCO, then, well, it might as well just > be UNICODE CHARACTER XXX01 or as Ken says, . > > Also, although everyone seems focussed on the glyphs, I think that > really isn't the issue. If I somehow invented an entirely new and > original set of glyphs for Klingon, and somehow got them retroactively > canonized and used and all that stuff (or to put it another way, if I > invented or held the copyright on the glyph forms), I don't think > Unicode would be any more comfortable about encoding KLINGON LETTER A or > even PIQAD LETTER A than they are now. It's not about the pictures, and > Unicode doesn't encode glyphs. Encoding Tengwar is or maybe is felt to > be... doing *something* sortakinda related to IP owned by the Tolkien > estate, and that's the sticking point. > > ~mark > > On 3/9/22 13:57, Ken Whistler via Unicode wrote: > > > > On 3/9/2022 10:38 AM, James Kass via Unicode wrote: > >> > >> Suppose a proposal works around any IP concerns, real or imaginary, > >> by using generic character names along the lines of CJK ideographs. > >> Such as: > >> > >> U+xxx01 FICTIONAL CONSCRIPT CHARACTER-XXX01 > >> U+xxx02 FICTIONAL CONSCRIPT CHARACTER-XXX02 > >> and so forth. > >> > >> The charts covering the ranges could be blank with a footnote > >> explaining that the lack of glyphs is due to IP concerns. The > >> proposal could refer to earlier proposals for usage examples and the > >> proposed range need not mention any author's name or copyrighted brands. > >> > >> Would such a proposal have any chance of moving forward towards > >> acceptance? > > > > Well, insofar as this is attempt to "encode" characters without > > providing reference glyphs or names or any meaningful semantics, it > > isn't much different from just using: > > > > U+F0001 > > U+F0002 > > ... > > > > I don't see the UTC going for this kind of pseudo-private-use concept. > > The whole point of *standardizing* characters is to spell out > > precisely what they are so that interchange is reliable. > > > > --Ken > > > >> > >> > >> From doug at ewellic.org Thu Mar 10 12:49:17 2022 From: doug at ewellic.org (Doug Ewell) Date: Thu, 10 Mar 2022 11:49:17 -0700 Subject: Tengwar on a general purpose translation site In-Reply-To: References: <000201d83246$680e5960$382b0c20$@ewellic.org> <3e04f431-033e-6645-b5b6-3de0eb8ba9fd@shoulson.com> <087e8148-b867-fcb5-0354-699498756fc2@code2001.com> <16ac788c-712f-8598-4dcf-b4fcec03cc9d@shoulson.com> Message-ID: <000001d834af$8c13f930$a43beb90$@ewellic.org> James Tauber wrote: > The intellectual property rights (to the extent they may be > enforceable or at least claimed) would be in the glyph and in the > name, right? Maybe, or maybe not. Only an attorney, or perhaps only a court ruling, can answer that question. > But the Tengwar has a canonical enumeration. So TENGWAR LETTER 1 is > meaningful in the context of LOTR Appendix E without reference to the > glyph or name 'tinco' so it's not quite the same as just saying > UNICODE CHARACTER XXX01. The first Unicode proposals for emoji (cf. normal symbols) included a mechanism to do something like this: ?Special, rarely used, carrier-specific symbols are proposed for encoding in the Emoji compatibility symbols block. They are needed to complete the set for interoperability but are only identified by their source mappings (N3585), not specific glyphs and names.? (L2/09-025R2) These included symbols of national interest such as MOUNT FUJI, TOKYO TOWER, and STATUE OF LIBERTY, as well as ten selected national flags. They were given cryptic names like EMOJI COMPATIBILITY SYMBOL-1 and intentionally opaque ?dotted-box? reference glyphs. There were originally 66 of these; the number varied in later proposals. This approach was rightly rejected as pseudo-encoding. Various solutions were devised instead: some of the symbols were encoded under their original names, and the Regional Indicator Symbols were created to allow all national flags (not just the ten) to be represented. Encoding Tengwar as ?letter 1,? ?letter 2,? and so forth, implicitly directing users to LOTR Appendix E to find the true identity of the characters, won't fool anyone, least of all an attorney for an estate that ?doesn't want tengwar in Unicode? and is prepared to fight over it. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org From wjgo_10009 at btinternet.com Thu Mar 10 13:33:20 2022 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Thu, 10 Mar 2022 19:33:20 +0000 (GMT) Subject: Tengwar on a general purpose translation site Message-ID: <6756b851.510d.17f755245ac.Webtop.102@btinternet.com> It seems to me that the goal to achieve for getting Tengwar encoded is for there to be a joint press release by the Tolkien Estate and Unicode Inc. announcing that encoding is to take place, with quotes from representatives of both entities, together with a Notes for Editors section of the press release with background information about Tengwar, the Tolkien Estate and about the Unicode Inc. organization. That way, an excellent result of an encoding could be achieved. I do not know enough about Tengwar to produce a proposal document for this for consideration by the Unicode Technical Committee, I can, however, make suggestions about the modalities. Suppose that the proposal to the Unicode Technical Committee includes the suggestion of a motion that the President of Unicode Inc. write a letter, signed with pen and ink personally by the President of Unicode Inc., not with a ballpoint pen, on good quality stationery, and sent, unfolded, by the postal system to the Tolkien Estate with a printed copy of the proposal saying that this proposal has been received and Unicode Inc. would like to proceed with encoding please and requests discussion in the hope of receiving approval from the Tolkien Estate. The letter could offer full consultation, a chapter of The Unicode Standard solely about Tengwar and the work of J. R. R. Tolkien, and, as consideration, a gold sovereign coin. If anyone can suggest a way to improve this suggestion, then fine. William Overington Thursday 10 March 2022 -------------- next part -------------- An HTML attachment was scrubbed... URL: From junicode at jcbradfield.org Thu Mar 10 16:46:42 2022 From: junicode at jcbradfield.org (Julian Bradfield) Date: Thu, 10 Mar 2022 22:46:42 +0000 (GMT) Subject: Tengwar on a general purpose translation site References: <6756b851.510d.17f755245ac.Webtop.102@btinternet.com> Message-ID: On 2022-03-10, William_J_G Overington via Unicode wrote: > It seems to me that the goal to achieve for getting Tengwar encoded is > for there to be a joint press release by the Tolkien Estate and Unicode > Inc. announcing that encoding is to take place, with quotes from > representatives of both entities, together with a Notes for Editors > section of the press release with background information about Tengwar, > the Tolkien Estate and about the Unicode Inc. organization. But the Tolkien Estate is not willing for tengwar to be encoded? > I do not know enough about Tengwar to produce a proposal document for > this for consideration by the Unicode Technical Committee, I can, > however, make suggestions about the modalities. Other people do, and have. > Suppose that the proposal to the Unicode Technical Committee includes > the suggestion of a motion that the President of Unicode Inc. write a > letter, signed with pen and ink personally by the President of Unicode > Inc., not with a ballpoint pen, on good quality stationery, and sent, > unfolded, by the postal system to the Tolkien Estate with a printed copy > of the proposal saying that this proposal has been received and Unicode > Inc. would like to proceed with encoding please and requests discussion > in the hope of receiving approval from the Tolkien Estate. The letter The President of the Unicode Inc. is not a person known to the Tolkien Estate, so I see no reason why they should care. Michael Everson, who wrote the original proposal, and I, who wrote to the Estate with a request for permission to encode, are both known to the Tolkien Estate, and we both have 30+ years' experience in asking for things from them. My letter explained carefully and at a level appropriate for both the lawyers and the family what Unicode is, and why encoding tengwar would be a good thing. The Estate does not wish to release tengwar into the public domain to the extent that would be required for an encoding. Somebody asked about public evidence of their attitude. Here is what their FAQ says https://www.tolkienestate.com/frequently-asked-questions-and-links/ Tolkien?s invented languages and scripts are protected by copyright. You may use them for your own private interest and amusement, but you may not reproduce them in any form of publication or in connection with any group activity, commercial or otherwise. As people have remarked, there are interesting legal arguments about whether this statement is true, but neither Unicode nor I can afford an argument with an organization which is not short of money for lawyers. From mark at kli.org Thu Mar 10 17:32:33 2022 From: mark at kli.org (Mark E. Shoulson) Date: Thu, 10 Mar 2022 18:32:33 -0500 Subject: Tengwar on a general purpose translation site In-Reply-To: References: <000201d83246$680e5960$382b0c20$@ewellic.org> <3e04f431-033e-6645-b5b6-3de0eb8ba9fd@shoulson.com> <087e8148-b867-fcb5-0354-699498756fc2@code2001.com> <16ac788c-712f-8598-4dcf-b4fcec03cc9d@shoulson.com> Message-ID: <535fbb09-22b9-5cb1-64f7-82987b67dbbe@shoulson.com> On 3/9/22 23:22, Steve Downey via Unicode wrote: > My impression over the years, or decades, is that no one wants to get > into potentially expensive IP litigation. There are sound arguments > that ConLang scripts don't have trademark or copyright protections > that ought to prevent them being standardized. However, without an > excellent defense team, there is the risk of a ruling going the other > way, causing trouble for everyone everywhere for a long?time. Basically.? No matter how good your defense is, in a case like this just having the suit brought against you is enough to count as defeat (unless you can prove the suit frivolous, which is not that easy to do.) ~mark From mark at kli.org Thu Mar 10 17:37:19 2022 From: mark at kli.org (Mark E. Shoulson) Date: Thu, 10 Mar 2022 18:37:19 -0500 Subject: Tengwar on a general purpose translation site In-Reply-To: <000001d834af$8c13f930$a43beb90$@ewellic.org> References: <000201d83246$680e5960$382b0c20$@ewellic.org> <3e04f431-033e-6645-b5b6-3de0eb8ba9fd@shoulson.com> <087e8148-b867-fcb5-0354-699498756fc2@code2001.com> <16ac788c-712f-8598-4dcf-b4fcec03cc9d@shoulson.com> <000001d834af$8c13f930$a43beb90$@ewellic.org> Message-ID: <19624b33-c4bf-5614-99ac-70048a80fc19@shoulson.com> Yeah, you can't really have it both ways.? If Unicode _officially_ encodes characters as tengwar, either by name or _officially_ pointing to appendix E of LotR, that's enough to make the copyright holders feel threatened.? On the other hand, if they _don't_ officially point to Appendix E, but you're just supposed to go along with an unofficial common agreement... well, then, that's exactly the same as using the? Private Use Areas, counting on unofficial agreements like the ConScript registry. If it "doesn't fool anyone," you risk legal action.? If it does, you haven't encoded it. ~mark On 3/10/22 13:49, Doug Ewell via Unicode wrote: > James Tauber wrote: > >> The intellectual property rights (to the extent they may be >> enforceable or at least claimed) would be in the glyph and in the >> name, right? > Maybe, or maybe not. Only an attorney, or perhaps only a court ruling, can answer that question. > >> But the Tengwar has a canonical enumeration. So TENGWAR LETTER 1 is >> meaningful in the context of LOTR Appendix E without reference to the >> glyph or name 'tinco' so it's not quite the same as just saying >> UNICODE CHARACTER XXX01. > The first Unicode proposals for emoji (cf. normal symbols) included a mechanism to do something like this: > > ?Special, rarely used, carrier-specific symbols are proposed for encoding in the Emoji compatibility symbols block. They are needed to complete the set for interoperability but are only identified by their source mappings (N3585), not specific glyphs and names.? (L2/09-025R2) > > These included symbols of national interest such as MOUNT FUJI, TOKYO TOWER, and STATUE OF LIBERTY, as well as ten selected national flags. They were given cryptic names like EMOJI COMPATIBILITY SYMBOL-1 and intentionally opaque ?dotted-box? reference glyphs. There were originally 66 of these; the number varied in later proposals. > > This approach was rightly rejected as pseudo-encoding. Various solutions were devised instead: some of the symbols were encoded under their original names, and the Regional Indicator Symbols were created to allow all national flags (not just the ten) to be represented. > > Encoding Tengwar as ?letter 1,? ?letter 2,? and so forth, implicitly directing users to LOTR Appendix E to find the true identity of the characters, won't fool anyone, least of all an attorney for an estate that ?doesn't want tengwar in Unicode? and is prepared to fight over it. > > -- > Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org > > From mark at kli.org Thu Mar 10 17:45:45 2022 From: mark at kli.org (Mark E. Shoulson) Date: Thu, 10 Mar 2022 18:45:45 -0500 Subject: Tengwar on a general purpose translation site In-Reply-To: References: <6756b851.510d.17f755245ac.Webtop.102@btinternet.com> Message-ID: <059c7ca1-52a3-40ab-88bd-b36ca48f2f4e@shoulson.com> Interesting.? In the case of tengwar, the copyright-holder has actually explicitly given an answer to the question:? they do not want it encoded.? But CBS/Paramount has never said anything one way or another about pIqaD (beyond boilerplate copyright notices), as far as I know.? So tengwar would seem to be further from encodability than pIqaD, in this sense.? (And yet, the one disapproved of is not the one that is non-approved.) As you say, the actual relevance, legality, and enforceability of the Estate's statement don't really matter.? Nobody wants to be in the position of showing it's wrong, even successfully. ~mark On 3/10/22 17:46, Julian Bradfield via Unicode wrote: > The Estate does not wish to release tengwar into the public domain to > the extent that would be required for an encoding. > > Somebody asked about public evidence of their attitude. Here is what > their FAQ says > > https://www.tolkienestate.com/frequently-asked-questions-and-links/ > > Tolkien?s invented languages and scripts are protected by > copyright. You may use them for your own private interest and > amusement, but you may not reproduce them in any form of publication > or in connection with any group activity, commercial or otherwise. > > As people have remarked, there are interesting legal arguments about > whether this statement is true, but neither Unicode nor I can afford an > argument with an organization which is not short of money for lawyers. From sdowney at gmail.com Thu Mar 10 18:02:08 2022 From: sdowney at gmail.com (Steve Downey) Date: Thu, 10 Mar 2022 19:02:08 -0500 Subject: Tengwar on a general purpose translation site In-Reply-To: <059c7ca1-52a3-40ab-88bd-b36ca48f2f4e@shoulson.com> References: <6756b851.510d.17f755245ac.Webtop.102@btinternet.com> <059c7ca1-52a3-40ab-88bd-b36ca48f2f4e@shoulson.com> Message-ID: In a decade or so, someone like Amazon is going to ask why they can't advertise in Social Media in Tengwar and ask hard questions to the Estate, at which point the new people in charge will be able to say their predecessors were wrong. Until then, without someone else setting legal precedent, it's just not worth spending a few $100K to find out. On Thu, Mar 10, 2022, 18:48 Mark E. Shoulson via Unicode < unicode at corp.unicode.org> wrote: > Interesting. In the case of tengwar, the copyright-holder has actually > explicitly given an answer to the question: they do not want it > encoded. But CBS/Paramount has never said anything one way or another > about pIqaD (beyond boilerplate copyright notices), as far as I know. > So tengwar would seem to be further from encodability than pIqaD, in > this sense. (And yet, the one disapproved of is not the one that is > non-approved.) > > As you say, the actual relevance, legality, and enforceability of the > Estate's statement don't really matter. Nobody wants to be in the > position of showing it's wrong, even successfully. > > ~mark > > On 3/10/22 17:46, Julian Bradfield via Unicode wrote: > > The Estate does not wish to release tengwar into the public domain to > > the extent that would be required for an encoding. > > > > Somebody asked about public evidence of their attitude. Here is what > > their FAQ says > > > > https://www.tolkienestate.com/frequently-asked-questions-and-links/ > > > > Tolkien?s invented languages and scripts are protected by > > copyright. You may use them for your own private interest and > > amusement, but you may not reproduce them in any form of publication > > or in connection with any group activity, commercial or otherwise. > > > > As people have remarked, there are interesting legal arguments about > > whether this statement is true, but neither Unicode nor I can afford an > > argument with an organization which is not short of money for lawyers. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Thu Mar 10 17:30:03 2022 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Thu, 10 Mar 2022 23:30:03 +0000 (GMT) Subject: Tengwar on a general purpose translation site In-Reply-To: References: <6756b851.510d.17f755245ac.Webtop.102@btinternet.com> Message-ID: <78836f1a.5597.17f762afd08.Webtop.102@btinternet.com> Julian Bradfield responded to my post. Thank you for explaining and for the link. Best regards, William Overington Thursday 10 March 2022 -------------- next part -------------- An HTML attachment was scrubbed... URL: From stas624-uni at yahoo.com Fri Mar 11 00:44:22 2022 From: stas624-uni at yahoo.com (stas) Date: Fri, 11 Mar 2022 06:44:22 +0000 (UTC) Subject: Tengwar on a general purpose translation site In-Reply-To: References: <6756b851.510d.17f755245ac.Webtop.102@btinternet.com> Message-ID: <719233525.1388915.1646981062656@mail.yahoo.com> > https://www.tolkienestate.com/frequently-asked-questions-and-links/ Wow, these people need to relax. I understand they don't want to lose some revenue sources, but this is too much. You can't name a park after Tolkien, really? " Fan Fiction The Tolkien Estate has a duty to protect the integrity of Tolkien?s original writings and artworks and takes copyright very seriously. This means that you cannot copy any part of Tolkien?s writings or images, nor can you create materials which refer to the characters, stories, places, events or other elements contained in any of Tolkien?s works. " "has a duty" - such a bullshit, it's all about money. I wonder what Tolkien himself would think about this. This is a good example of intellectual property rights stifling innovation. (and why the fuck they disabled text copying on their site? too much) -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Fri Mar 11 22:49:56 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sat, 12 Mar 2022 04:49:56 +0000 Subject: Tengwar on a general purpose translation site In-Reply-To: References: <000201d83246$680e5960$382b0c20$@ewellic.org> <1084485675.1117773.1646674711669@mail.yahoo.com> <265ecba2-54e9-0b6a-b262-0466ae04521f@code2001.com> Message-ID: <20220312044956.05878d9d@JRWUBU2> On Mon, 7 Mar 2022 21:15:24 +0000 (GMT) Julian Bradfield via Unicode wrote: > On 2022-03-07, James Kass via Unicode > wrote: > > What?s the estate?s stance on Tengwar in the CSUR? > > You would have to ask them, and explain what the CSUR is. > > My guess would be that the answer would be neutral, as CSUR is > entirely unofficial and has no legal baggage associated with it. > > In practice, the Tolkien Estate does not interfere with fan activities > provided (a) they're not commercialized, and (b) they do not bring > Tolkien's name into disrepute (in the opinion of the Estate). > Whether they even go so far as to admit this, I don't remember, but by > observation it's been their practice for the last fifty years. Might there be a copyright protection on 20th century additions to the Cyrillic script that prohibits their use to bring the Russian government into disrepute? Richard. From richard.wordingham at ntlworld.com Sat Mar 12 04:39:28 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sat, 12 Mar 2022 10:39:28 +0000 Subject: Tengwar on a general purpose translation site In-Reply-To: References: <000201d83246$680e5960$382b0c20$@ewellic.org> <3e04f431-033e-6645-b5b6-3de0eb8ba9fd@shoulson.com> <087e8148-b867-fcb5-0354-699498756fc2@code2001.com> Message-ID: <20220312103928.5ecd41ef@JRWUBU2> On Wed, 9 Mar 2022 10:57:26 -0800 Ken Whistler via Unicode wrote: > On 3/9/2022 10:38 AM, James Kass via Unicode wrote: > > > > Suppose a proposal works around any IP concerns, real or imaginary, > > by using generic character names along the lines of CJK ideographs. > > Such as: > > > > U+xxx01? FICTIONAL CONSCRIPT CHARACTER-XXX01 > > U+xxx02? FICTIONAL CONSCRIPT CHARACTER-XXX02 > > and so forth. > > > > The charts covering the ranges could be blank with a footnote > > explaining that the lack of glyphs is due to IP concerns.? The > > proposal could refer to earlier proposals for usage examples and > > the proposed range need not mention any author's name or > > copyrighted brands. > > > > Would such a proposal have any chance of moving forward towards > > acceptance? > > Well, insofar as this is attempt to "encode" characters without > providing reference glyphs or names or any meaningful semantics, it > isn't much different from just using: > > U+F0001 > U+F0002 > ... > > I don't see the UTC going for this kind of pseudo-private-use > concept. The whole point of *standardizing* characters is to spell > out precisely what they are so that interchange is reliable. But by assigning characters, one can then add meaningful properties. For example, are the tehtar compulsorily ligated letters (2001 proposal) or combining marks (earlier, apparent encoding in the CSUR). The CSUR appears to lack an encoding for Tengwar - it has a provisional encoding, which is no better than a font-encoding. Richard. From doug at ewellic.org Sat Mar 12 10:57:39 2022 From: doug at ewellic.org (Doug Ewell) Date: Sat, 12 Mar 2022 09:57:39 -0700 Subject: Tengwar on a general purpose translation site In-Reply-To: <20220312044956.05878d9d@JRWUBU2> References: <000201d83246$680e5960$382b0c20$@ewellic.org> <1084485675.1117773.1646674711669@mail.yahoo.com> <265ecba2-54e9-0b6a-b262-0466ae04521f@code2001.com> <20220312044956.05878d9d@JRWUBU2> Message-ID: <006201d83632$487a02d0$d96e0870$@ewellic.org> In separate messages, Richard Wordingham wrote: > Might there be a copyright protection on 20th century additions to the > Cyrillic script that prohibits their use to bring the Russian > government into disrepute? That depends on what the attorneys retained by the Estate of the Preslav Literary School have to say about it. I guess this was a facetious question, but other than making a comment about current events, I'm not sure what purpose it serves with relation to Tolkien?s scripts. > For example, are the tehtar compulsorily ligated letters (2001 > proposal) or combining marks (earlier, apparent encoding in the CSUR). > The CSUR appears to lack an encoding for Tengwar - it has a > provisional encoding, which is no better than a font-encoding. The section ?Modes? in the CSUR proposal is instructive here; it reconciles the fact that the tehtar are non-spacing characters with the dotted-line glyphs in the chart. The Tengwar proposal, like many CSUR proposals (but unlike most ?real? Unicode proposals in recent years), lacks a list of Unicode properties in UnicodeData.txt format. But in general, the distinction between an ?encoding? and a ?provisional encoding? seems overly pedantic for CSUR, which was always a fun, part-time project, and on which most work ended almost 20 years ago. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org From dpk at nonceword.org Sat Mar 12 12:36:05 2022 From: dpk at nonceword.org (Daphne Preston-Kendal) Date: Sat, 12 Mar 2022 19:36:05 +0100 Subject: Tengwar on a general purpose translation site In-Reply-To: <719233525.1388915.1646981062656@mail.yahoo.com> References: <6756b851.510d.17f755245ac.Webtop.102@btinternet.com> <719233525.1388915.1646981062656@mail.yahoo.com> Message-ID: <02996892-AFC7-4F93-B7FB-5E27C7A75E91@nonceword.org> I note by-the-by that in countries such as Canada where the copyright term is 50 years p.m.a., Tolkien?s works enter the public domain on 1 January 2024 anyway. This doesn?t help Unicode, which is an international standard published by a US-based consortium. Nor does it help against their rather stretched trademark claims, which are certainly an abuse of intellectual property law. But it will be interesting to see how they adapt to this new situation. Daphne On 11 Mar 2022, at 07:44, stas via Unicode wrote: > > https://www.tolkienestate.com/frequently-asked-questions-and-links/ > > Wow, these people need to relax. I understand they don't want to lose some revenue sources, but this is too much. > You can't name a park after Tolkien, really? > > " > Fan Fiction > > The Tolkien Estate has a duty to protect the integrity of Tolkien?s original writings and artworks and takes copyright very seriously. This means that you cannot copy any part of Tolkien?s writings or images, nor can you create materials which refer to the characters, stories, places, events or other elements contained in any of Tolkien?s works. > " > > "has a duty" - such a bullshit, it's all about money. > I wonder what Tolkien himself would think about this. > > This is a good example of intellectual property rights stifling innovation. > > (and why the fuck they disabled text copying on their site? too much) From richard.wordingham at ntlworld.com Sat Mar 12 16:42:23 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sat, 12 Mar 2022 22:42:23 +0000 Subject: Tengwar on a general purpose translation site In-Reply-To: <006201d83632$487a02d0$d96e0870$@ewellic.org> References: <000201d83246$680e5960$382b0c20$@ewellic.org> <1084485675.1117773.1646674711669@mail.yahoo.com> <265ecba2-54e9-0b6a-b262-0466ae04521f@code2001.com> <20220312044956.05878d9d@JRWUBU2> <006201d83632$487a02d0$d96e0870$@ewellic.org> Message-ID: <20220312224223.666b33e2@JRWUBU2> On Sat, 12 Mar 2022 09:57:39 -0700 Doug Ewell via Unicode wrote: > In separate messages, Richard Wordingham wrote: > > > Might there be a copyright protection on 20th century additions to > > the Cyrillic script that prohibits their use to bring the Russian > > government into disrepute? > > That depends on what the attorneys retained by the Estate of the > Preslav Literary School have to say about it. > > I guess this was a facetious question, but other than making a > comment about current events, I'm not sure what purpose it serves > with relation to Tolkien?s scripts. It's a possible case where untrammelled permission to use new letters may not have been given. > > For example, are the tehtar compulsorily ligated letters (2001 > > proposal) or combining marks (earlier, apparent encoding in the > > CSUR). The CSUR appears to lack an encoding for Tengwar - it has a > > provisional encoding, which is no better than a font-encoding. > > The section ?Modes? in the CSUR proposal is instructive here; it > reconciles the fact that the tehtar are non-spacing characters with > the dotted-line glyphs in the chart. The description at https://www.evertype.com/standards/csur/tengwar.html implies that that tehta codepoints are applied to the previous consonant, which implies a visual order encoding, as opposed to the 2001 phonetic order encoding. While a phonetic order encoding seems appealing for a language with two modes mostly differing as CV v. VC ligaturing, the scheme does seem to need language tagging for tolerable rendering. Under the 2001 scheme, which proposes an encoding in the SMP, not in a PUA, the tehtar would merit being letters, just like the non-spacing letter U+0D4E MALAYALAM LETTER DOT REPH. > The Tengwar proposal, like many CSUR proposals (but unlike most > ?real? Unicode proposals in recent years), lacks a list of Unicode > properties in UnicodeData.txt format. But in general, the distinction > between an ?encoding? and a ?provisional encoding? seems overly > pedantic for CSUR, which was always a fun, part-time project, and on > which most work ended almost 20 years ago. Nothing to do with interoperability, then? Richard. From mark at kli.org Sat Mar 12 18:23:57 2022 From: mark at kli.org (Mark E. Shoulson) Date: Sat, 12 Mar 2022 19:23:57 -0500 Subject: Tengwar on a general purpose translation site In-Reply-To: <719233525.1388915.1646981062656@mail.yahoo.com> References: <6756b851.510d.17f755245ac.Webtop.102@btinternet.com> <719233525.1388915.1646981062656@mail.yahoo.com> Message-ID: On 3/11/22 01:44, stas via Unicode wrote: > "has a duty" - such a bullshit, it's all about money. > I wonder what Tolkien himself would think about this. That's one of the sadder aspects of this.? Tolkien was trying to create a sort of underlying mythology, one which might give rise to later myths as we know them (e.g. Atalant? to Atlantis).? He was trying to create a whole mythos.? And he has succeeded so well, beyond any expectation he could have hoped to have.? Aspects of his mythology (e.g. a race of Elves that's not tiny fairies, but long-lived or immortal approximately human-sized, extremely beautiful and skilled people) have become staples across so much of the fantasy literature landscape; pretty much the entire concept of Dungeons & Dragons (and? RPGs spawned therefrom) sprang from the "A Journey in the Dark" chapter of FotR, dungeon-crawling through Moria.? Even for some modern fiction that Tolkien maybe would have hated, he probably would have appreciated how thoroughly and utterly his mythos has been adopted.? (see _Author of the Century_ by Tom Shippey, also _Leaf by Niggle_ by Tolkien, and his description of the creation of the Dwarves in The Silmarillion, both of which can be read as a sort of self-description.)? The Tolkien Estate's attempted stranglehold on the concept is way too little and way too late to have really affected Tolkien's success in this regard, it's just sort of ironic. > This is a good example of intellectual property rights stifling > innovation. > > (and why the fuck they disabled text copying on their site? too much) OK, disabling text-copying is a little ridiculous. ~mark From jameskass at code2001.com Sat Mar 12 19:38:30 2022 From: jameskass at code2001.com (James Kass) Date: Sun, 13 Mar 2022 01:38:30 +0000 Subject: Tengwar on a general purpose translation site In-Reply-To: References: <6756b851.510d.17f755245ac.Webtop.102@btinternet.com> <719233525.1388915.1646981062656@mail.yahoo.com> Message-ID: <4a7426c4-3b37-adff-ffd5-596e06098b78@code2001.com> Unicode's mission is to provide a standard encoding for the world's writing systems.? Tengwar is one of those systems.? Suggestions made earlier regarding working around the estate's bans aren't about fooling anybody.? Rather the goal should be to get Tengwar encoded while honoring the estate's wishes.? Such a blind encoding shouldn't be viewed as "pseudo-coding".? As has been pointed out, Unicode does not encode glyphs, so Tolkien's glyphs aren't necessary.? Chart glyphs could be control pictures along the lines of "last resort" fonts.? If the naming convention for CJK ideographs and other encoded scripts isn't good enough for Tengwar, then name them something else.? Like "FICTIONAL CONSCRIPT TT LETTER A", or whatever. As Richard Wordingham has pointed out, the encoding will assign properties to the characters so that applications can process them correctly.? Collation and so forth aren't IP.? The actual users of the script will know the score and non-users don't need to know. Maintaining the status quo until some future estate epiphany means that non-standard data will continue to proliferate.? The current situation has some texts using ASCII-overlay fonts while other texts use CSUR encoding. It would be wonderful if the Unicode cognoscenti would use their considerable knowledge and skills to come up with a solution which would satisfy everybody instead of pointing out why this idea or that idea won't work.? In the event of an eventual estate epiphany, proper charts could be added to the Standard along with a proper write-up for that range.? The "alias" fields could be updated appropriately.? Meanwhile, the encoding should avoid any mention of Tolkien, his works, his art, his glyphs, and his critters. From richard.wordingham at ntlworld.com Sun Mar 13 06:01:05 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sun, 13 Mar 2022 11:01:05 +0000 Subject: Tengwar on a general purpose translation site In-Reply-To: <4a7426c4-3b37-adff-ffd5-596e06098b78@code2001.com> References: <6756b851.510d.17f755245ac.Webtop.102@btinternet.com> <719233525.1388915.1646981062656@mail.yahoo.com> <4a7426c4-3b37-adff-ffd5-596e06098b78@code2001.com> Message-ID: <20220313110105.6cea4941@JRWUBU2> On Sun, 13 Mar 2022 01:38:30 +0000 James Kass via Unicode wrote: > Unicode's mission is to provide a standard encoding for the world's > writing systems.? Tengwar is one of those systems.? Suggestions made > earlier regarding working around the estate's bans aren't about > fooling anybody.? Rather the goal should be to get Tengwar encoded > while honoring the estate's wishes.? Such a blind encoding shouldn't > be viewed as "pseudo-coding".? As has been pointed out, Unicode does > not encode glyphs, so Tolkien's glyphs aren't necessary.? Chart > glyphs could be control pictures along the lines of "last resort" > fonts.? If the naming convention for CJK ideographs and other encoded > scripts isn't good enough for Tengwar, then name them something else. > Like "FICTIONAL CONSCRIPT TT LETTER A", or whatever. The script is already registered in ISO 363. Thus I can't see any objection in isolation to the concept of a character TENGWAR LETTER T for what may more commonly be known as 'tinco'. However, their arrangement (see remarks on collation below) might be another matter. > As Richard Wordingham has pointed out, the encoding will assign > properties to the characters so that applications can process them > correctly.? Collation and so forth aren't IP.? The actual users of > the script will know the score and non-users don't need to know. That depends on the collation. A collation based on the traditional tabulation of the tengwar might be protected by copyright. An underlying order 't', 'p', 'c', 'k' is original. Now, a collation based on transliteration wouldn't be protected, and has precedent in the default collation for the Lao script, which is based on mechanical transliteration to the Thai script. > Maintaining the status quo until some future estate epiphany means > that non-standard data will continue to proliferate.? The current > situation has some texts using ASCII-overlay fonts while other texts > use CSUR encoding. The estate appears to be relying on copyright. That generally expires in 2044, on the 70th anniversary of Tolkien's death. Richard. From richard.wordingham at ntlworld.com Sun Mar 13 06:16:00 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sun, 13 Mar 2022 11:16:00 +0000 Subject: Tengwar on a general purpose translation site In-Reply-To: <20220313110105.6cea4941@JRWUBU2> References: <6756b851.510d.17f755245ac.Webtop.102@btinternet.com> <719233525.1388915.1646981062656@mail.yahoo.com> <4a7426c4-3b37-adff-ffd5-596e06098b78@code2001.com> <20220313110105.6cea4941@JRWUBU2> Message-ID: <20220313111600.61408762@JRWUBU2> On Sun, 13 Mar 2022 11:01:05 +0000 Richard Wordingham via Unicode wrote: > The script is already registered in ISO 363. Scrub that. It's in the ISO 15924 register, but I don't know how it got there. Richard. From doug at ewellic.org Sun Mar 13 18:41:20 2022 From: doug at ewellic.org (Doug Ewell) Date: Sun, 13 Mar 2022 17:41:20 -0600 Subject: Tengwar on a general purpose translation site In-Reply-To: <20220312224223.666b33e2@JRWUBU2> References: <000201d83246$680e5960$382b0c20$@ewellic.org> <1084485675.1117773.1646674711669@mail.yahoo.com> <265ecba2-54e9-0b6a-b262-0466ae04521f@code2001.com> <20220312044956.05878d9d@JRWUBU2> <006201d83632$487a02d0$d96e0870$@ewellic.org> <20220312224223.666b33e2@JRWUBU2> Message-ID: <009601d83733$d808e130$881aa390$@ewellic.org> Richard Wordingham wrote: >>> Might there be a copyright protection on 20th century additions to >>> the Cyrillic script that prohibits their use to bring the Russian >>> government into disrepute? >> >> That depends on what the attorneys retained by the Estate of the >> Preslav Literary School have to say about it. >> >> I guess this was a facetious question, but other than making a >> comment about current events, I'm not sure what purpose it serves >> with relation to Tolkien?s scripts. > > It's a possible case where untrammelled permission to use new letters > may not have been given. By whom? Nobody owns Cyrillic; nobody has claimed IP rights to it. It?s used to write several dozen languages. Russia has no more claim to it than the US or UK has to the Latin script. > The description at https://www.evertype.com/standards/csur/tengwar.html > implies that that tehta codepoints are applied to the previous > consonant, which implies a visual order encoding, as opposed to the > 2001 phonetic order encoding. While a phonetic order encoding seems > appealing for a language with two modes mostly differing as CV v. VC > ligaturing, the scheme does seem to need language tagging for > tolerable rendering. That seems clear enough. > Under the 2001 scheme, which proposes an encoding in the SMP, not in a > PUA, the tehtar would merit being letters, just like the non-spacing > letter U+0D4E MALAYALAM LETTER DOT REPH. The section ?Rendering? in the 2001 document seems to me to make the same statements about modes and tehtar as the CSUR proposal. >> The Tengwar proposal, like many CSUR proposals (but unlike most ?real? >> Unicode proposals in recent years), lacks a list of Unicode properties >> in UnicodeData.txt format. But in general, the distinction between an >> ?encoding? and a ?provisional encoding? seems overly pedantic for >> CSUR, which was always a fun, part-time project, and on which most >> work ended almost 20 years ago. > > Nothing to do with interoperability, then? All of us, including Unicode itself, have become more and more cognizant of the importance of properties to interoperability as time has passed and experience has been gained. CSUR originated in 1993, just a few years after Unicode 1.0 was released, and no, not all of the CSUR proposals cover as much ground with respect to interoperability as we would like, or as we would insist upon today. Remember how informal Unicode properties themselves were in the early days. I do have a Unicode properties listing for my conscript, but it doesn?t appear in that script?s CSUR proposal. Of course, as we know now, UnicodeData.txt format isn?t ideal either, as it excludes several important properties and is hard for humans to read. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org From doug at ewellic.org Sun Mar 13 18:54:49 2022 From: doug at ewellic.org (Doug Ewell) Date: Sun, 13 Mar 2022 17:54:49 -0600 Subject: Tengwar on a general purpose translation site In-Reply-To: <20220313111600.61408762@JRWUBU2> References: <6756b851.510d.17f755245ac.Webtop.102@btinternet.com> <719233525.1388915.1646981062656@mail.yahoo.com> <4a7426c4-3b37-adff-ffd5-596e06098b78@code2001.com> <20220313110105.6cea4941@JRWUBU2> <20220313111600.61408762@JRWUBU2> Message-ID: <009701d83735$b9d359a0$2d7a0ce0$@ewellic.org> Richard Wordingham wrote: >> The script is already registered in ISO 363. > > Scrub that. It's in the ISO 15924 register, but I don't know how it > got there. I was wondering what textile machinery and accessories had to do with any of this. Both Tengwar and Cirth (and also Klingon) were listed in preliminary draft 15924 documents compiled by Michael Everson in 1997 and 1998. As with characters that were present in Unicode 1.0, there were probably no formal criteria or vetting process for inclusion in those documents of any script, thus no definitive way to know how any of them got there. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org From prosfilaes at gmail.com Sun Mar 13 19:52:29 2022 From: prosfilaes at gmail.com (David Starner) Date: Sun, 13 Mar 2022 19:52:29 -0500 Subject: Tengwar on a general purpose translation site In-Reply-To: <20220313110105.6cea4941@JRWUBU2> References: <6756b851.510d.17f755245ac.Webtop.102@btinternet.com> <719233525.1388915.1646981062656@mail.yahoo.com> <4a7426c4-3b37-adff-ffd5-596e06098b78@code2001.com> <20220313110105.6cea4941@JRWUBU2> Message-ID: On Sun, Mar 13, 2022 at 6:03 AM Richard Wordingham via Unicode wrote: > The estate appears to be relying on copyright. That generally expires > in 2044, on the 70th anniversary of Tolkien's death. More than half the people in the world live in nations with differing copyright terms, including the three biggest (China, India and the US) and 12 out of the 20 biggest nations. China and many other nations are life+50, so in 2024; India and Bangladesh are life+60, so 2034, and the US is 95 years from publication*, so 2033 for anything in the Hobbit to 2050 for Return of the King. Mexico is life+100, so it looks like the Lord of the Rings will be under copyright there until 2074. * Yes, it's more complex, but that's the applicable rule. -- The standard is written in English . If you have trouble understanding a particular section, read it again and again and again . . . Sit up straight. Eat your vegetables. Do not mumble. -- _Pascal_, ISO 7185 (1991) From richard.wordingham at ntlworld.com Mon Mar 14 16:04:03 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Mon, 14 Mar 2022 21:04:03 +0000 Subject: Tengwar on a general purpose translation site In-Reply-To: References: <6756b851.510d.17f755245ac.Webtop.102@btinternet.com> <719233525.1388915.1646981062656@mail.yahoo.com> <4a7426c4-3b37-adff-ffd5-596e06098b78@code2001.com> <20220313110105.6cea4941@JRWUBU2> Message-ID: <20220314210403.6570d1eb@JRWUBU2> On Sun, 13 Mar 2022 19:52:29 -0500 David Starner via Unicode wrote: > On Sun, Mar 13, 2022 at 6:03 AM Richard Wordingham via Unicode > wrote: > > The estate appears to be relying on copyright. That generally > > expires in 2044, on the 70th anniversary of Tolkien's death. > > More than half the people in the world live in nations with differing > copyright terms, including the three biggest (China, India and the US) > and 12 out of the 20 biggest nations. China and many other nations are > life+50, so in 2024; India and Bangladesh are life+60, so 2034, and > the US is 95 years from publication*, so 2033 for anything in the > Hobbit to 2050 for Return of the King. Mexico is life+100, so it looks > like the Lord of the Rings will be under copyright there until 2074. > > * Yes, it's more complex, but that's the applicable rule. But do any of those later dates apply to works by a purely British author? The Berne convention does not extend the copyright beyond 2044, which is the rule for British authorship. What about the copyright in 'Peter Pan' and the King James Bible? (They're still in copyright in the UK.) Richard. From wjgo_10009 at btinternet.com Mon Mar 14 16:53:37 2022 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Mon, 14 Mar 2022 21:53:37 +0000 (GMT) Subject: Tengwar on a general purpose translation site In-Reply-To: <2d41a95f.aa67.17f8a5e4f5b.Webtop.102@btinternet.com> References: <6756b851.510d.17f755245ac.Webtop.102@btinternet.com> <719233525.1388915.1646981062656@mail.yahoo.com> <4a7426c4-3b37-adff-ffd5-596e06098b78@code2001.com> <20220313110105.6cea4941@JRWUBU2> <20220314210403.6570d1eb@JRWUBU2> <2d41a95f.aa67.17f8a5e4f5b.Webtop.102@btinternet.com> Message-ID: <479d1b0.aa94.17f8a6c26b4.Webtop.102@btinternet.com> Richard Wordingham wrote: > What about the copyright in 'Peter Pan' and the King James Bible? > (They're still in copyright in the UK.) Quote from https://www.gov.uk/government/publications/copyright-notice-duration-of-copyright-term/copyright-notice-duration-of-copyright-term > Peter Pan > In 1929, the author, JM Barrie, gifted the rights to his play, Peter > Pan, to the Great Ormond Street Hospital for Children. The copyright > in this work expired in 1987, 50 years after his death. However, in > 1988 Parliament introduced a perpetual right to royalties for the use > of the Peter Pan play, payable to the Great Ormond Street Hospital for > Children. I opine that it would be good if a law were introduced such that the copyright owner at the time, of any work of creative writing could, before the copyright expired, gift, by a registration process, a perpetual right to royalties with the proceeds going to provide additional funding for the United Kingdom National Health Service. William Overington Monday 14 March 2022 From prosfilaes at gmail.com Mon Mar 14 18:29:03 2022 From: prosfilaes at gmail.com (David Starner) Date: Mon, 14 Mar 2022 18:29:03 -0500 Subject: Tengwar on a general purpose translation site In-Reply-To: <20220314210403.6570d1eb@JRWUBU2> References: <6756b851.510d.17f755245ac.Webtop.102@btinternet.com> <719233525.1388915.1646981062656@mail.yahoo.com> <4a7426c4-3b37-adff-ffd5-596e06098b78@code2001.com> <20220313110105.6cea4941@JRWUBU2> <20220314210403.6570d1eb@JRWUBU2> Message-ID: On Mon, Mar 14, 2022 at 4:07 PM Richard Wordingham via Unicode wrote: > > On Sun, 13 Mar 2022 19:52:29 -0500 > David Starner via Unicode wrote: > > > More than half the people in the world live in nations with differing > > copyright terms, including the three biggest (China, India and the US) > > and 12 out of the 20 biggest nations. China and many other nations are > > life+50, so in 2024; India and Bangladesh are life+60, so 2034, and > > the US is 95 years from publication*, so 2033 for anything in the > > Hobbit to 2050 for Return of the King. Mexico is life+100, so it looks > > like the Lord of the Rings will be under copyright there until 2074. > > > > * Yes, it's more complex, but that's the applicable rule. > > But do any of those later dates apply to works by a purely British > author? The Berne convention does not extend the copyright beyond > 2044, which is the rule for British authorship. The Berne Convention is life+50, which is 2024. It permits the rule of the shorter term, but does not require it. The US definitely does not have the rule of the shorter term, so British authors are treated like Americans, and the Lord of the Rings will be in copyright there until 2049 and 2050. It doesn't look like Mexico has the rule of the shorter term either, so it looks like 2074 there. -- The standard is written in English . If you have trouble understanding a particular section, read it again and again and again . . . Sit up straight. Eat your vegetables. Do not mumble. -- _Pascal_, ISO 7185 (1991) From mark at kli.org Mon Mar 14 19:09:02 2022 From: mark at kli.org (Mark E. Shoulson) Date: Mon, 14 Mar 2022 20:09:02 -0400 Subject: Tengwar on a general purpose translation site In-Reply-To: <4a7426c4-3b37-adff-ffd5-596e06098b78@code2001.com> References: <6756b851.510d.17f755245ac.Webtop.102@btinternet.com> <719233525.1388915.1646981062656@mail.yahoo.com> <4a7426c4-3b37-adff-ffd5-596e06098b78@code2001.com> Message-ID: <3e4eac6d-2e35-da0d-ee9d-3b4e56f8b807@shoulson.com> On 3/12/22 20:38, James Kass via Unicode wrote: > > Unicode's mission is to provide a standard encoding for the world's > writing systems.? Tengwar is one of those systems. Suggestions made > earlier regarding working around the estate's bans aren't about > fooling anybody.? Rather the goal should be to get Tengwar encoded > while honoring the estate's wishes.? Such a blind encoding shouldn't > be viewed as "pseudo-coding".? As has been pointed out, Unicode does > not encode glyphs, so Tolkien's glyphs aren't necessary.? Chart glyphs > could be control pictures along the lines of "last resort" fonts.? If > the naming convention for CJK ideographs and other encoded scripts > isn't good enough for Tengwar, then name them something else.? Like > "FICTIONAL CONSCRIPT TT LETTER A", or whatever. OK, to be sure, CJK ideographs have meaningless names (mostly). But there is also information in the standard which specifies clearly exactly which graph is intended.? That's part of the standard, or nobody would know what you meant. Like Ken said before, if all you say is "FICTIONAL CONSCRIPT TT LETTER A" and don't define what that means in the standard, it might as well be PUA.? If you *do* define what it means in the standard, then it doesn't matter what the _name_ is, you still run the risk of annoying whomever you don't want to annoy. > Meanwhile, the encoding should avoid any mention of Tolkien, his > works, his art, his glyphs, and his critters. If you don't mention them, how are you describing what the character refers to? ~mark From jameskass at code2001.com Mon Mar 14 19:43:13 2022 From: jameskass at code2001.com (James Kass) Date: Tue, 15 Mar 2022 00:43:13 +0000 Subject: Tengwar on a general purpose translation site In-Reply-To: <3e4eac6d-2e35-da0d-ee9d-3b4e56f8b807@shoulson.com> References: <6756b851.510d.17f755245ac.Webtop.102@btinternet.com> <719233525.1388915.1646981062656@mail.yahoo.com> <4a7426c4-3b37-adff-ffd5-596e06098b78@code2001.com> <3e4eac6d-2e35-da0d-ee9d-3b4e56f8b807@shoulson.com> Message-ID: On 2022-03-15 12:09 AM, Mark E. Shoulson via Unicode wrote: > OK, to be sure, CJK ideographs have meaningless names (mostly). But > there is also information in the standard which specifies clearly > exactly which graph is intended.? That's part of the standard, or > nobody would know what you meant. > > Like Ken said before, if all you say is "FICTIONAL CONSCRIPT TT LETTER > A" and don't define what that means in the standard, it might as well > be PUA.? If you *do* define what it means in the standard, then it > doesn't matter what the _name_ is, you still run the risk of annoying > whomever you don't want to annoy. > >> Meanwhile, the encoding should avoid any mention of Tolkien, his >> works, his art, his glyphs, and his critters. > > If you don't mention them, how are you describing what the character > refers to? It's an abstract character with a unique name. The suggestion was put forward with the idea of expunging any reference to anybody's intellectual property, thereby eliminating any risk of any estate getting sand in their knickers. Of course it is not an optimal solution, interim workarounds seldom are. Users who really, really need to see the glyphs can install an appropriate font and fire up Unibook or BabelMap. If the estate were presented with two proposals, Plan A and Plan B, where Plan A would be optimal and Plan B would be interim -- and the estate was advised that the optimal Plan A could not move forward without their release but that Plan B needed no such release -- the estate might do the right thing. Maybe the estate will become enlightened in ten or twenty years. Maybe it won't.? Would the user community be better served by maintaining the status quo or by standardization? Tolkien didn't invent the concept of abstract characters. From richard.wordingham at ntlworld.com Mon Mar 14 20:19:16 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Tue, 15 Mar 2022 01:19:16 +0000 Subject: Tengwar on a general purpose translation site In-Reply-To: References: <6756b851.510d.17f755245ac.Webtop.102@btinternet.com> <719233525.1388915.1646981062656@mail.yahoo.com> <4a7426c4-3b37-adff-ffd5-596e06098b78@code2001.com> <20220313110105.6cea4941@JRWUBU2> <20220314210403.6570d1eb@JRWUBU2> Message-ID: <20220315011916.468128c1@JRWUBU2> On Mon, 14 Mar 2022 18:29:03 -0500 David Starner via Unicode wrote: > On Mon, Mar 14, 2022 at 4:07 PM Richard Wordingham via Unicode > wrote: > > > > On Sun, 13 Mar 2022 19:52:29 -0500 > > David Starner via Unicode wrote: > > > > > More than half the people in the world live in nations with > > > differing copyright terms, including the three biggest (China, > > > India and the US) and 12 out of the 20 biggest nations. China and > > > many other nations are life+50, so in 2024; India and Bangladesh > > > are life+60, so 2034, and the US is 95 years from publication*, > > > so 2033 for anything in the Hobbit to 2050 for Return of the > > > King. Mexico is life+100, so it looks like the Lord of the Rings > > > will be under copyright there until 2074. > > > > > > * Yes, it's more complex, but that's the applicable rule. > > > > But do any of those later dates apply to works by a purely British > > author? The Berne convention does not extend the copyright beyond > > 2044, which is the rule for British authorship. > > The Berne Convention is life+50, which is 2024. It permits the rule of > the shorter term, but does not require it. The US definitely does not > have the rule of the shorter term, so British authors are treated like > Americans, and the Lord of the Rings will be in copyright there until > 2049 and 2050. It doesn't look like Mexico has the rule of the shorter > term either, so it looks like 2074 there. The Hobbit is in copyright in the UK until 2044. Life+50 is in general a minimum requirement of the convention. Which copyright law is relevant to Unicode Incorporated and to ISO 10646? Richard. From richard.wordingham at ntlworld.com Mon Mar 14 20:26:47 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Tue, 15 Mar 2022 01:26:47 +0000 Subject: Tengwar on a general purpose translation site In-Reply-To: References: <6756b851.510d.17f755245ac.Webtop.102@btinternet.com> <719233525.1388915.1646981062656@mail.yahoo.com> <4a7426c4-3b37-adff-ffd5-596e06098b78@code2001.com> <3e4eac6d-2e35-da0d-ee9d-3b4e56f8b807@shoulson.com> Message-ID: <20220315012647.45ce5c3d@JRWUBU2> On Tue, 15 Mar 2022 00:43:13 +0000 James Kass via Unicode wrote: > Users who really, really need to see the glyphs can install an > appropriate font and fire up Unibook or BabelMap. Won't the font be in breach of the alleged copyright? I presume the current English Wikipedia page on Tengwar is in breach of the alleged copyright. Richard. From mark at kli.org Mon Mar 14 20:37:15 2022 From: mark at kli.org (Mark E. Shoulson) Date: Mon, 14 Mar 2022 21:37:15 -0400 Subject: Tengwar on a general purpose translation site In-Reply-To: References: <6756b851.510d.17f755245ac.Webtop.102@btinternet.com> <719233525.1388915.1646981062656@mail.yahoo.com> <4a7426c4-3b37-adff-ffd5-596e06098b78@code2001.com> <3e4eac6d-2e35-da0d-ee9d-3b4e56f8b807@shoulson.com> Message-ID: On 3/14/22 20:43, James Kass via Unicode wrote: > > > On 2022-03-15 12:09 AM, Mark E. Shoulson via Unicode wrote: >> >>> Meanwhile, the encoding should avoid any mention of Tolkien, his >>> works, his art, his glyphs, and his critters. >> >> If you don't mention them, how are you describing what the character >> refers to? > > It's an abstract character with a unique name. Yeah, but WHICH abstract character?? It could be _anything_. Maybe it's TINCO, maybe it's PARMA, maybe CALMA.? Maybe PIQAD LETTER A.? Maybe SEUSSIAN LETTER WUM.? It could be anything, so it's nothing.? It only is something if people unofficially and informally agree that it's something.? Which is exactly what the PUA is and does.? If the standard doesn't say what it is, then as far as the standard is concerned, it could be anything, which doesn't get us very far. If the standard calls it LETTER 001 and mentions in the text that it corresponds to tengwar, that's another matter.? Then the standard would be defining it and stating what it is and what it isn't.? But then as far as the Tolkien people are concerned, it might as be named TENGWAR LETTER TINCO.? Or at least, as far as the Tolkien people might be concerned, in the Unicode consortium's mind and fears.? We can't have it both ways.? Anything sufficient to be not-PUA is enough to arouse corporate ire. > The suggestion was put forward with the idea of expunging any > reference to anybody's intellectual property, thereby eliminating any > risk of any estate getting sand in their knickers. > > Of course it is not an optimal solution, interim workarounds seldom are. The only possible advantage I can see to this is that _someday_ the characters will be able to be encoded officially, and they'll already be in place.? But with the wrong names.? In the meantime, they're essentially private-use characters with unofficial mappings, like CSUR.? Just that the CSUR-encoded text doesn't become obsolete when the official encodings happen.? I don't know if that's enough of a plus to consider. > Tolkien didn't invent the concept of abstract characters. Nobody said he did.? But he did invent a set of abstract characters, and that's the problem. ~mark From mark at kli.org Mon Mar 14 20:45:28 2022 From: mark at kli.org (Mark E. Shoulson) Date: Mon, 14 Mar 2022 21:45:28 -0400 Subject: Tengwar on a general purpose translation site In-Reply-To: <20220315012647.45ce5c3d@JRWUBU2> References: <6756b851.510d.17f755245ac.Webtop.102@btinternet.com> <719233525.1388915.1646981062656@mail.yahoo.com> <4a7426c4-3b37-adff-ffd5-596e06098b78@code2001.com> <3e4eac6d-2e35-da0d-ee9d-3b4e56f8b807@shoulson.com> <20220315012647.45ce5c3d@JRWUBU2> Message-ID: <937c64cd-87cd-e473-2639-dad0278ec154@shoulson.com> I dunno, it gets weird when you consider fonts and glyphs and stuff.? You can't actually copyright a font, apparently.? And even if you could, so Tolkien's tengwar shapes might be protected, but what about Tengwar Optime (http://www.peter-wiegel.de/TengwarOptime.html)?? Or Elbic Caslon (http://www.peter-wiegel.de/ElbicCaslon.html)?? Could they argue that these were derivative works or something?? Probably.? I don't know.? And I don't think it matters.? Unicode doesn't encode glyphs anyway.? I think quibbling over fonts and glyphs is kind of a red herring to the actual problem. ~mark On 3/14/22 21:26, Richard Wordingham via Unicode wrote: > On Tue, 15 Mar 2022 00:43:13 +0000 > James Kass via Unicode wrote: > >> Users who really, really need to see the glyphs can install an >> appropriate font and fire up Unibook or BabelMap. > Won't the font be in breach of the alleged copyright? > > I presume the current English Wikipedia page on Tengwar is in breach of > the alleged copyright. > > Richard. From wjgo_10009 at btinternet.com Tue Mar 15 13:47:14 2022 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Tue, 15 Mar 2022 18:47:14 +0000 (GMT) Subject: Tengwar on a general purpose translation site In-Reply-To: References: <6756b851.510d.17f755245ac.Webtop.102@btinternet.com> <719233525.1388915.1646981062656@mail.yahoo.com> <4a7426c4-3b37-adff-ffd5-596e06098b78@code2001.com> <3e4eac6d-2e35-da0d-ee9d-3b4e56f8b807@shoulson.com> Message-ID: <76c21e35.c608.17f8ee7dd02.Webtop.102@btinternet.com> It seems to me that the best available possibility is as follows. Prepare a document and submit it for consideration by The Unicode Technical Committee, as follows. ---- Set out the encoding part, with or without glyphs as you think best, but clearly for Tengwar. Include a note stating that in order to encode, permission needs to be obtained from the owners of the intellectual property rights. Include a section about a motion for the Unicode Technical Committee based on my suggestion that is included in the post that is archived as follows. https://corp.unicode.org/pipermail/unicode/2022-March/010024.html ---- Lots of possible reasons why it may not produce the desired result. 1. The idea might be dismissed and possibly ridiculed in this mailing list. 2. Maybe nobody actually produces a document. 3. The gatekeeper might not allow the document to be added to the UTC Document Register. 4. UTC might decide not to discuss it. 5. UTC might decide in its discussion not to consider the motion. 6. UTC might vote against passing the motion. 7. It might not be possible to find a pen and ink. 8. The Tolkien Estate might choose not to enter into a discussion. 9. The discussion might not lead to a satisfactory agreement. However, maybe none of those possibilites will happen, and there will be a very satisfactory outcome. And even if the first listed item happens, that need not stop a satisfactory result becoming achieved. It seems to me that the issue is as to whether the effort necessary to try to get a result by this method is worth making given that the effort may not lead to the desired result, yet it might. Please bear in mind that in the 1970s I wrote, from home, as an individual, to a broadcaster asking if they would broadcast some software on a teletext page as an experiment, even though there was at the time no equipment upon which to run the software. They invited me to visit them to discuss the idea. They made the broadcast. In time, the invention achieved success. So requests can sometimes get an amazingly good result. In order to make progress one needs to try to achieve. To aim high. Not every try succeeds. For example, my idea for a slide rule to multiply and divide units never got implemented. Win some lose some. William Overington Tuesday 15 March 2022 From richard.wordingham at ntlworld.com Tue Mar 15 14:33:18 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Tue, 15 Mar 2022 19:33:18 +0000 Subject: Tengwar on a general purpose translation site In-Reply-To: <937c64cd-87cd-e473-2639-dad0278ec154@shoulson.com> References: <6756b851.510d.17f755245ac.Webtop.102@btinternet.com> <719233525.1388915.1646981062656@mail.yahoo.com> <4a7426c4-3b37-adff-ffd5-596e06098b78@code2001.com> <3e4eac6d-2e35-da0d-ee9d-3b4e56f8b807@shoulson.com> <20220315012647.45ce5c3d@JRWUBU2> <937c64cd-87cd-e473-2639-dad0278ec154@shoulson.com> Message-ID: <20220315193318.2686ca49@JRWUBU2> On Mon, 14 Mar 2022 21:45:28 -0400 "Mark E. Shoulson via Unicode" wrote: > I dunno, it gets weird when you consider fonts and glyphs and stuff.? > You can't actually copyright a font, apparently. Font files can in general be subject to copyright, even if the glyphs can't be. > And even if you > could, so Tolkien's tengwar shapes might be protected, but what about > Tengwar Optime (http://www.peter-wiegel.de/TengwarOptime.html)?? Or > Elbic Caslon (http://www.peter-wiegel.de/ElbicCaslon.html)?? Could > they argue that these were derivative works or something?? Probably. > I don't know.? And I don't think it matters. My strong suspicion is that they are, if the Tolkien Estate has its claimed copyright, in breach of it. When fonts have been licensed, commercial use has been prohibited, but these fonts are released under GPL with the font exception, and it states that in the name table. I don't know of any fonts that are still licensed. With this combination, only withholding the font can prevent commercial use. > Unicode doesn't encode > glyphs anyway.? I think quibbling over fonts and glyphs is kind of a > red herring to the actual problem. The identifying reference would probably have to be a book. Richard. > On 3/14/22 21:26, Richard Wordingham via Unicode wrote: > > On Tue, 15 Mar 2022 00:43:13 +0000 > > James Kass via Unicode wrote: > > > >> Users who really, really need to see the glyphs can install an > >> appropriate font and fire up Unibook or BabelMap. > > Won't the font be in breach of the alleged copyright? > > > > I presume the current English Wikipedia page on Tengwar is in > > breach of the alleged copyright. From richard.wordingham at ntlworld.com Tue Mar 15 16:07:32 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Tue, 15 Mar 2022 21:07:32 +0000 Subject: Tengwar on a general purpose translation site In-Reply-To: <009601d83733$d808e130$881aa390$@ewellic.org> References: <000201d83246$680e5960$382b0c20$@ewellic.org> <1084485675.1117773.1646674711669@mail.yahoo.com> <265ecba2-54e9-0b6a-b262-0466ae04521f@code2001.com> <20220312044956.05878d9d@JRWUBU2> <006201d83632$487a02d0$d96e0870$@ewellic.org> <20220312224223.666b33e2@JRWUBU2> <009601d83733$d808e130$881aa390$@ewellic.org> Message-ID: <20220315210732.7f4da4b4@JRWUBU2> On Sun, 13 Mar 2022 17:41:20 -0600 Doug Ewell via Unicode wrote: > Richard Wordingham wrote: > > It's a possible case where untrammelled permission to use new > > letters may not have been given. > By whom? Nobody owns Cyrillic; nobody has claimed IP rights to it. > It?s used to write several dozen languages. Russia has no more claim > to it than the US or UK has to the Latin script. By whoever or whatever added the *new* letters. > > The description at > > https://www.evertype.com/standards/csur/tengwar.html implies that > > that tehta codepoints are applied to the previous consonant, which > > implies a visual order encoding, as opposed to the 2001 phonetic > > order encoding. While a phonetic order encoding seems appealing > > for a language with two modes mostly differing as CV v. VC > > ligaturing, the scheme does seem to need language tagging for > > tolerable rendering. > > That seems clear enough. > > > Under the 2001 scheme, which proposes an encoding in the SMP, not > > in a PUA, the tehtar would merit being letters, just like the > > non-spacing letter U+0D4E MALAYALAM LETTER DOT REPH. > > The section ?Rendering? in the 2001 document seems to me to make the > same statements about modes and tehtar as the CSUR proposal. Under the former, cons1-tehta-cons2 has tehta displayed on cons1. In the 2001 proposal, a Sindarin font would display the tehta on cons2. > >> The Tengwar proposal, like many CSUR proposals (but unlike most > >> ?real? Unicode proposals in recent years), lacks a list of Unicode > >> properties in UnicodeData.txt format. But in general, the > >> distinction between an ?encoding? and a ?provisional encoding? > >> seems overly pedantic for CSUR, which was always a fun, part-time > >> project, and on which most work ended almost 20 years ago. > > > > Nothing to do with interoperability, then? I was referring to the next best thing to proper encoding. If I encounter Ewellic text, and have a font that supports Ewellic, it should support the text. It's rather disappointing to see that the CSUR doesn't have a single mapping from codepoints to Tengwar characters. Richard. From mark at kli.org Tue Mar 15 16:51:04 2022 From: mark at kli.org (Mark E. Shoulson) Date: Tue, 15 Mar 2022 17:51:04 -0400 Subject: Tengwar on a general purpose translation site In-Reply-To: <20220315210732.7f4da4b4@JRWUBU2> References: <000201d83246$680e5960$382b0c20$@ewellic.org> <1084485675.1117773.1646674711669@mail.yahoo.com> <265ecba2-54e9-0b6a-b262-0466ae04521f@code2001.com> <20220312044956.05878d9d@JRWUBU2> <006201d83632$487a02d0$d96e0870$@ewellic.org> <20220312224223.666b33e2@JRWUBU2> <009601d83733$d808e130$881aa390$@ewellic.org> <20220315210732.7f4da4b4@JRWUBU2> Message-ID: On 3/15/22 17:07, Richard Wordingham via Unicode wrote: > On Sun, 13 Mar 2022 17:41:20 -0600 > Doug Ewell via Unicode wrote: > >> Richard Wordingham wrote: >>> Under the 2001 scheme, which proposes an encoding in the SMP, not >>> in a PUA, the tehtar would merit being letters, just like the >>> non-spacing letter U+0D4E MALAYALAM LETTER DOT REPH. >> The section ?Rendering? in the 2001 document seems to me to make the >> same statements about modes and tehtar as the CSUR proposal. > Under the former, cons1-tehta-cons2 has tehta displayed on cons1. In > the 2001 proposal, a Sindarin font would display the tehta on cons2. If you ask me, it's pretty clear that tehtar are/should be combining characters, like accents or Hebrew vowels.? And yes, then Sindarin gets encoded with a non-obvious ordering.? But really, in the context of all the various input-method pain people get put through for other scripts, is that really so terrible? Even Hebrew codes the furtive PATAH after the letter even though it's pronounced before it.? (That's only one vowel, and not a very common one at that, but still.) But you didn't ask me (which was probably a smart move), and it's far too soon to be actually concerned about this anyway.? Next time the proposal is updated for serious consideration we can drag this all out. ~mark From jameskass at code2001.com Tue Mar 15 20:35:24 2022 From: jameskass at code2001.com (James Kass) Date: Wed, 16 Mar 2022 01:35:24 +0000 Subject: Tengwar on a general purpose translation site In-Reply-To: References: <000201d83246$680e5960$382b0c20$@ewellic.org> <1084485675.1117773.1646674711669@mail.yahoo.com> <265ecba2-54e9-0b6a-b262-0466ae04521f@code2001.com> <20220312044956.05878d9d@JRWUBU2> <006201d83632$487a02d0$d96e0870$@ewellic.org> <20220312224223.666b33e2@JRWUBU2> <009601d83733$d808e130$881aa390$@ewellic.org> <20220315210732.7f4da4b4@JRWUBU2> Message-ID: <13d766c6-9774-4dff-7451-73ffb37c6563@code2001.com> On 2022-03-15 9:51 PM, Mark E. Shoulson via Unicode wrote: > Next time the proposal is updated for serious consideration we can > drag this all out. Any updated proposal is rejected in advance due to IP concerns.? So why would anyone spend resources updating the proposal? When conventional approaches don't get us what we want, we can either try a different approach or give it up and move on. From mark at kli.org Tue Mar 15 22:01:22 2022 From: mark at kli.org (Mark E. Shoulson) Date: Tue, 15 Mar 2022 23:01:22 -0400 Subject: Tengwar on a general purpose translation site In-Reply-To: <13d766c6-9774-4dff-7451-73ffb37c6563@code2001.com> References: <000201d83246$680e5960$382b0c20$@ewellic.org> <1084485675.1117773.1646674711669@mail.yahoo.com> <265ecba2-54e9-0b6a-b262-0466ae04521f@code2001.com> <20220312044956.05878d9d@JRWUBU2> <006201d83632$487a02d0$d96e0870$@ewellic.org> <20220312224223.666b33e2@JRWUBU2> <009601d83733$d808e130$881aa390$@ewellic.org> <20220315210732.7f4da4b4@JRWUBU2> <13d766c6-9774-4dff-7451-73ffb37c6563@code2001.com> Message-ID: Yes, exactly.? An updated proposal for serious consideration, then, could only happen after IP issues are resolved (if ever.) At which time (if any), fights like combining characters vs ligatures or whatever can be gleefully engaged in by enthusiastic proponents. ~mark On 3/15/22 21:35, James Kass via Unicode wrote: > > > On 2022-03-15 9:51 PM, Mark E. Shoulson via Unicode wrote: >> Next time the proposal is updated for serious consideration we can >> drag this all out. > > Any updated proposal is rejected in advance due to IP concerns. So why > would anyone spend resources updating the proposal? > > When conventional approaches don't get us what we want, we can either > try a different approach or give it up and move on. From jameskass at code2001.com Wed Mar 16 00:06:57 2022 From: jameskass at code2001.com (James Kass) Date: Wed, 16 Mar 2022 05:06:57 +0000 Subject: Tengwar on a general purpose translation site In-Reply-To: References: <000201d83246$680e5960$382b0c20$@ewellic.org> <1084485675.1117773.1646674711669@mail.yahoo.com> <265ecba2-54e9-0b6a-b262-0466ae04521f@code2001.com> <20220312044956.05878d9d@JRWUBU2> <006201d83632$487a02d0$d96e0870$@ewellic.org> <20220312224223.666b33e2@JRWUBU2> <009601d83733$d808e130$881aa390$@ewellic.org> <20220315210732.7f4da4b4@JRWUBU2> <13d766c6-9774-4dff-7451-73ffb37c6563@code2001.com> Message-ID: <252137cd-791b-ec94-a8e3-d51068fec807@code2001.com> On 2022-03-16 3:01 AM, Mark E. Shoulson via Unicode wrote: > An updated proposal for serious consideration, then, could only happen > after IP issues are resolved (if ever.) Or until someone comes up with an unconventional approach and makes it palatable.? Be aware, though, that anyone cheeky enough to suggest an unconventional approach will immediately get sniped at from all sides -- even from those who desire progress towards standardization -- simply *because* the approach is unconventional. From sdowney at gmail.com Wed Mar 16 00:46:25 2022 From: sdowney at gmail.com (Steve Downey) Date: Wed, 16 Mar 2022 01:46:25 -0400 Subject: Tengwar on a general purpose translation site In-Reply-To: <252137cd-791b-ec94-a8e3-d51068fec807@code2001.com> References: <000201d83246$680e5960$382b0c20$@ewellic.org> <1084485675.1117773.1646674711669@mail.yahoo.com> <265ecba2-54e9-0b6a-b262-0466ae04521f@code2001.com> <20220312044956.05878d9d@JRWUBU2> <006201d83632$487a02d0$d96e0870$@ewellic.org> <20220312224223.666b33e2@JRWUBU2> <009601d83733$d808e130$881aa390$@ewellic.org> <20220315210732.7f4da4b4@JRWUBU2> <13d766c6-9774-4dff-7451-73ffb37c6563@code2001.com> <252137cd-791b-ec94-a8e3-d51068fec807@code2001.com> Message-ID: But standards are pretty much the definition of conventional. We have plenty of solutions, even within the Unicode framework, for high quality private agreements. Doing a bad job of standardization just makes life worse for everyone. And probably still gets the Consortium sued, and losing, setting a terrible precedent. On Wed, Mar 16, 2022, 01:10 James Kass via Unicode wrote: > > > On 2022-03-16 3:01 AM, Mark E. Shoulson via Unicode wrote: > > An updated proposal for serious consideration, then, could only happen > > after IP issues are resolved (if ever.) > > Or until someone comes up with an unconventional approach and makes it > palatable. Be aware, though, that anyone cheeky enough to suggest an > unconventional approach will immediately get sniped at from all sides -- > even from those who desire progress towards standardization -- simply > *because* the approach is unconventional. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Wed Mar 16 02:33:02 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 16 Mar 2022 07:33:02 +0000 Subject: Tengwar on a general purpose translation site In-Reply-To: References: <000201d83246$680e5960$382b0c20$@ewellic.org> <1084485675.1117773.1646674711669@mail.yahoo.com> <265ecba2-54e9-0b6a-b262-0466ae04521f@code2001.com> <20220312044956.05878d9d@JRWUBU2> <006201d83632$487a02d0$d96e0870$@ewellic.org> <20220312224223.666b33e2@JRWUBU2> <009601d83733$d808e130$881aa390$@ewellic.org> <20220315210732.7f4da4b4@JRWUBU2> <13d766c6-9774-4dff-7451-73ffb37c6563@code2001.com> Message-ID: <20220316073302.41126a58@JRWUBU2> On Tue, 15 Mar 2022 23:01:22 -0400 "Mark E. Shoulson via Unicode" wrote: > Yes, exactly.? An updated proposal for serious consideration, then, > could only happen after IP issues are resolved (if ever.) At which > time (if any), fights like combining characters vs ligatures or > whatever can be gleefully engaged in by enthusiastic proponents. And comtemplate whether we need a bidi class TN (Tengwar Numeral). Implicit bidi won't work if Arabic script text incorporates bits of Tengwar with numerals in it. Richard, From richard.wordingham at ntlworld.com Wed Mar 16 02:37:16 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Wed, 16 Mar 2022 07:37:16 +0000 Subject: Tengwar on a general purpose translation site In-Reply-To: References: <000201d83246$680e5960$382b0c20$@ewellic.org> <1084485675.1117773.1646674711669@mail.yahoo.com> <265ecba2-54e9-0b6a-b262-0466ae04521f@code2001.com> <20220312044956.05878d9d@JRWUBU2> <006201d83632$487a02d0$d96e0870$@ewellic.org> <20220312224223.666b33e2@JRWUBU2> <009601d83733$d808e130$881aa390$@ewellic.org> <20220315210732.7f4da4b4@JRWUBU2> Message-ID: <20220316073716.273a1a6c@JRWUBU2> On Tue, 15 Mar 2022 17:51:04 -0400 "Mark E. Shoulson via Unicode" wrote: > On 3/15/22 17:07, Richard Wordingham via Unicode wrote: > > On Sun, 13 Mar 2022 17:41:20 -0600 > > Doug Ewell via Unicode wrote: > > > >> Richard Wordingham wrote: > >>> Under the 2001 scheme, which proposes an encoding in the SMP, not > >>> in a PUA, the tehtar would merit being letters, just like the > >>> non-spacing letter U+0D4E MALAYALAM LETTER DOT REPH. > >> The section ?Rendering? in the 2001 document seems to me to make > >> the same statements about modes and tehtar as the CSUR proposal. > > Under the former, cons1-tehta-cons2 has tehta displayed on cons1. > > In the 2001 proposal, a Sindarin font would display the tehta on > > cons2. > > If you ask me, it's pretty clear that tehtar are/should be combining > characters, like accents or Hebrew vowels.? And yes, then Sindarin > gets encoded with a non-obvious ordering.? But really, in the context > of all the various input-method pain people get put through for other > scripts, is that really so terrible? Even Hebrew codes the furtive > PATAH after the letter even though it's pronounced before it. > (That's only one vowel, and not a very common one at that, but still.) That's an oddity I couldn't find called out in TUS. I was considering asking about that. Apparently furtive pathah is sometimes written bottom right rather than below. But the Sindarin vowel positioning also applies to English, which may still be the language most often used in new Tengwar text. Richard. From jameskass at code2001.com Thu Mar 17 00:13:58 2022 From: jameskass at code2001.com (James Kass) Date: Thu, 17 Mar 2022 05:13:58 +0000 Subject: Tengwar on a general purpose translation site In-Reply-To: References: <000201d83246$680e5960$382b0c20$@ewellic.org> <1084485675.1117773.1646674711669@mail.yahoo.com> <265ecba2-54e9-0b6a-b262-0466ae04521f@code2001.com> <20220312044956.05878d9d@JRWUBU2> <006201d83632$487a02d0$d96e0870$@ewellic.org> <20220312224223.666b33e2@JRWUBU2> <009601d83733$d808e130$881aa390$@ewellic.org> <20220315210732.7f4da4b4@JRWUBU2> <13d766c6-9774-4dff-7451-73ffb37c6563@code2001.com> <252137cd-791b-ec94-a8e3-d51068fec807@code2001.com> Message-ID: On 2022-03-16 5:46 AM, Steve Downey via Unicode wrote: > But standards are pretty much the definition of conventional. It can be hard to find precedent for something which has never been done, such as standardizing a writing system for which IP protection is claimed. > We have plenty of solutions, even within the Unicode framework, for high > quality private agreements. > > Doing a bad job of standardization just makes life worse for everyone. And > probably still gets the Consortium sued, and losing, setting a terrible > precedent. The probability of the estate filing a frivolous lawsuit over an encoding that doesn't even mention any of the IP seems slim.? The likelihood of the Consortium losing such a lawsuit seems even slimmer.? But the only way to prove it either way would be to put it to the test.? We can probably agree that the Consortium isn't eager to put it to the test. From costello at mitre.org Thu Mar 17 13:18:00 2022 From: costello at mitre.org (Roger L Costello) Date: Thu, 17 Mar 2022 18:18:00 +0000 Subject: Why is it called case "folding"? Message-ID: Hi Folks, I read [1] that this is what case folding is: Case folding is the process of making two texts which differ only in case identical for comparison purposes, that is, it is meant for the purpose of string matching. I understand the use of the word "case," as in uppercase and lowercase. I don't understand the use of the word "folding." Folding? Huh? I fold my towel. I fold my shirt. I don't fold my case (whatever that means). Why is it called case "folding"? Who came up with that term? /Roger [1] https://www.w3.org/TR/charmod-norm/#definitionCaseFolding From asmusf at ix.netcom.com Thu Mar 17 13:52:06 2022 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Thu, 17 Mar 2022 11:52:06 -0700 Subject: Why is it called case "folding"? In-Reply-To: References: Message-ID: <16c2e18e-c49c-37b5-6fe7-4e02af783d70@ix.netcom.com> An HTML attachment was scrubbed... URL: From kenwhistler at sonic.net Thu Mar 17 14:07:43 2022 From: kenwhistler at sonic.net (Ken Whistler) Date: Thu, 17 Mar 2022 12:07:43 -0700 Subject: Why is it called case "folding"? In-Reply-To: References: Message-ID: <814bedc5-757e-8b25-339d-e0bdd21b18d4@sonic.net> Roger, "Folding" is essentially a mapping operation which is designed to reduce the number of elements in some systematic way. In Unicode, "case folding" is a mapping operation which is specified to map uppercase letters to their corresponding lowercase letters (with various exceptions all spelled out in detail in the actual mapping table that specifies the folding). When you apply this "folding" to a set of strings, you then end up with strings that can be compared without case differences for individual letters mattering. You'll find "folding" mentioned in many other contexts in IT processing. For example, HTML parsers do whitespace "folding", which maps down any number of adjoining spaces and/or tabs (and line breaks) into a single space, so that arbitrary differences in the spacing and breaking of lines in the HTML source doesn't result in differences in the formatted output for display. You can find lots of examples of "mapping" and "folding" (as special cases of the more generic sense of "function") in computer science literature. --Ken On 3/17/2022 11:18 AM, Roger L Costello via Unicode wrote: > Hi Folks, > > I read [1] that this is what case folding is: > > Case folding is the process of making two texts which differ only in case identical for comparison purposes, that is, it is meant for the purpose of string matching. > > I understand the use of the word "case," as in uppercase and lowercase. > > I don't understand the use of the word "folding." Folding? Huh? I fold my towel. I fold my shirt. I don't fold my case (whatever that means). > > Why is it called case "folding"? Who came up with that term? > > /Roger > > [1] https://www.w3.org/TR/charmod-norm/#definitionCaseFolding > From aprilop at freenet.de Thu Mar 17 14:56:34 2022 From: aprilop at freenet.de (Andreas Prilop) Date: Thu, 17 Mar 2022 19:56:34 +0000 Subject: Why is it called case "folding"? In-Reply-To: <16c2e18e-c49c-37b5-6fe7-4e02af783d70@ix.netcom.com> References: <16c2e18e-c49c-37b5-6fe7-4e02af783d70@ix.netcom.com> Message-ID: <4E354E16-342F-42B3-88C0-88CF03D353DC@freenet.de> On 17 March 2022, Asmus Freytag wrote: > Content-Type: text/html > >

Wonderful!


From asmusf at ix.netcom.com  Thu Mar 17 17:10:46 2022
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Thu, 17 Mar 2022 15:10:46 -0700
Subject: Why is it called case "folding"?
In-Reply-To: <4E354E16-342F-42B3-88C0-88CF03D353DC@freenet.de>
References: 
 <16c2e18e-c49c-37b5-6fe7-4e02af783d70@ix.netcom.com>
 <4E354E16-342F-42B3-88C0-88CF03D353DC@freenet.de>
Message-ID: <205689e3-0e3c-c744-de6d-7f181281cc07@ix.netcom.com>

An HTML attachment was scrubbed...
URL: 

From doug at ewellic.org  Thu Mar 17 18:40:09 2022
From: doug at ewellic.org (Doug Ewell)
Date: Thu, 17 Mar 2022 17:40:09 -0600
Subject: Why is it called case "folding"?
In-Reply-To: 
References: 
Message-ID: <001f01d83a58$571b0030$05510090$@ewellic.org>

Roger L Costello wrote:

> Why is it called case "folding"?

If you think of a piece of paper with the uppercase alphabet written at the top of the page, and the lowercase alphabet at the bottom, and then folding the page in half so that the uppercase letters are on top of the lowercase letters (or vice versa), that's kind of the image.

--
Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org



From aprilop at freenet.de  Fri Mar 18 02:48:14 2022
From: aprilop at freenet.de (Andreas Prilop)
Date: Fri, 18 Mar 2022 07:48:14 +0000
Subject: Why is it called case "folding"?
In-Reply-To: <205689e3-0e3c-c744-de6d-7f181281cc07@ix.netcom.com>
References: 
 <16c2e18e-c49c-37b5-6fe7-4e02af783d70@ix.netcom.com>
 <4E354E16-342F-42B3-88C0-88CF03D353DC@freenet.de>
 <205689e3-0e3c-c744-de6d-7f181281cc07@ix.netcom.com>
Message-ID: <9496AF4E-7A91-48B8-9538-23D91AAF7508@freenet.de>

On 17 March 2022, Asmus Freytag wrote:

>>> Content-Type: text/html
>>> 
 
>>
>>  Wonderful! 
> 
> And your point being?

You (or your user-agent) send messages as ?text/html? with ?
?,
which prevents wrapping of lines. The reader has to scroll horizontally
or sees the message in a tiny font size.
See your own message at:

https://corp.unicode.org/pipermail/unicode/2022-March/010065.html

https://corp.unicode.org/pipermail/unicode/attachments/20220317/a8b80bc4/attachment.htm

Why are you sending text/html in the first place?
You do not have any formatting at all.


From textexin at xencraft.com  Fri Mar 18 04:04:48 2022
From: textexin at xencraft.com (Tex)
Date: Fri, 18 Mar 2022 02:04:48 -0700
Subject: Why is it called case "folding"?
In-Reply-To: <001f01d83a58$571b0030$05510090$@ewellic.org>
References: 
 <001f01d83a58$571b0030$05510090$@ewellic.org>
Message-ID: <000c01d83aa7$38b49da0$aa1dd8e0$@xencraft.com>

Doug, So it isn't the case (no pun intended)  without the extra characters (cards) you can't win, like in poker, and so you fold.

-----Original Message-----
From: Unicode [mailto:unicode-bounces at corp.unicode.org] On Behalf Of Doug Ewell via Unicode
Sent: Thursday, March 17, 2022 4:40 PM
To: 'Roger L Costello'; 'unicode at unicode.org'
Subject: RE: Why is it called case "folding"?

Roger L Costello wrote:

> Why is it called case "folding"?

If you think of a piece of paper with the uppercase alphabet written at the top of the page, and the lowercase alphabet at the bottom, and then folding the page in half so that the uppercase letters are on top of the lowercase letters (or vice versa), that's kind of the image.

--
Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org




From lyratelle at gmx.de  Sat Mar 19 04:59:13 2022
From: lyratelle at gmx.de (Dominikus Dittes Scherkl)
Date: Sat, 19 Mar 2022 10:59:13 +0100
Subject: Why is it called case "folding"?
In-Reply-To: <000c01d83aa7$38b49da0$aa1dd8e0$@xencraft.com>
References: 
 <001f01d83a58$571b0030$05510090$@ewellic.org>
 <000c01d83aa7$38b49da0$aa1dd8e0$@xencraft.com>
Message-ID: <2c5f0d63-8c08-e1b8-ebdf-2955781fa98e@gmx.de>

Am 18.03.22 um 10:04 schrieb Tex via Unicode:
> Doug, So it isn't the case (no pun intended)  without the extra characters (cards) you can't win, like in poker, and so you fold.

In card-games the term comes from "folding" the spread cards in your
hand to one block, because the distinction doesn't matter to you anymore
(as you have given up).


--
                                          Dominikus Dittes Scherkl



From richard.wordingham at ntlworld.com  Sun Mar 20 12:58:26 2022
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Sun, 20 Mar 2022 17:58:26 +0000
Subject: Fault in Bidi Algorithm at BD16
Message-ID: <20220320175826.5ee0b834@JRWUBU2>

There is a fault in BD16, at least at Unicode 14.0:

The problem lies in this part of the algorithm:

"If an opening paired bracket is found and there is room in the stack,
push its Bidi_Paired_Bracket property value and its text position onto
the stack.

If an opening paired bracket is found and there is no room
in the stack, stop processing BD16 for the remainder of the isolating
run sequence.

If a closing paired bracket is found, do the following:

1.  Declare a variable that holds a reference to the current stack
    element and initialize it with the top element of the stack.

2.  Compare the closing paired bracket being inspected or its
    canonical equivalent to the bracket in the current stack element."

It was picked up by line 312 of BidiCharacterTests.txt:

0061 0020 2329 0062 002E 0031 3009;1;1;2 2 2 2 2 2 2;0 1 2 3 4 5 6

This line primarily checks that U+2329 and U+3009 are identified as a
'bracket pair'.  bpb(U+2329) is U+232A, whose canonical decomposition
is U+3009.  However, the step *numbered* '2' is non-determistic; it
contains the word 'or'.  The simple, robust solution is to change 'or
its canonical equivalent' to 'and its canonical equivalents'.  That
also avoids the risk of 'its canonical equivalent' being interpreted as
the result of the function to_NFC or to_NFD.

It feels simpler to work with the NFC or NFD equivalents of the
candidate opening and closing brackets at both the first and last of
the quoted steps.

I admit that part of the problem was that I was using a tool that
assumed that canonically equivalent characters had the same Unicode
properties.

Richard.

From kenwhistler at sonic.net  Sun Mar 20 14:49:19 2022
From: kenwhistler at sonic.net (Ken Whistler)
Date: Sun, 20 Mar 2022 12:49:19 -0700
Subject: Fault in Bidi Algorithm at BD16
In-Reply-To: <20220320175826.5ee0b834@JRWUBU2>
References: <20220320175826.5ee0b834@JRWUBU2>
Message-ID: <1040dbb3-ce80-caf1-c31d-b6974bec007a@sonic.net>

Richard,

On 3/20/2022 10:58 AM, Richard Wordingham via Unicode wrote:
> 2.  Compare the closing paired bracket being inspected or its
>      canonical equivalent to the bracket in the current stack element."
>
> It was picked up by line 312 of BidiCharacterTests.txt:
>
> 0061 0020 2329 0062 002E 0031 3009;1;1;2 2 2 2 2 2 2;0 1 2 3 4 5 6
>
> This line primarily checks that U+2329 and U+3009 are identified as a
> 'bracket pair'.  bpb(U+2329) is U+232A, whose canonical decomposition
> is U+3009.  However, the step*numbered*  '2' is non-determistic; it
> contains the word 'or'.

I'm not seeing it. The inclusion of an "or" there does not make this 
non-deterministic.

Yes, the text is not pedantically precise, I suppose, but most people 
have not had trouble interpreting what is intended. If your candidate 
closing bracket (or the canonical equivalent of your candidate closing 
bracket) matches the closing bracket match mapping detailed in 
BidiBrackets.txt for the opening bracket candidate on the stack, then 
you have a bracket match.

This affects precisely 2329 and 232A because those are the *only* 
brackets listed in BidiBrackets.txt that have canonical decomposition 
mappings. And it is vanishingly unlikely that the UTC is ever going to 
add more such paired brackets with canonical decomposition mappings.

>   The simple, robust solution is to change 'or
> its canonical equivalent' to 'and its canonical equivalents'.
I don't think that actually would clarify the text. And we shouldn't 
imply more of a requirement to import normalization into UBA than is 
actually needed.
>   That
> also avoids the risk of 'its canonical equivalent' being interpreted as
> the result of the function to_NFC or to_NFD.

I don't see the distinction here. The NFC *and* NFD form of 2329 are 
both 3008. The NFC *and* NFD form of 232A are both 3009. You could use 
either of those and still end up with the right result for the bracket 
match. But why bother?

The BidiReference code just does a hard-coded additional test (and 
explains why). For this particular edge case, that works just as well, 
is just as robust (see above assertion that UTC isn't going to add more 
exceptions to be dealt with), and would be *faster* than introducing a 
step to normalize the brackets:

 ?? ???? if ( ( bracketData.bracket == closingcp ) ||
 ?? ???? ??? ?( ( bracketData.bracket == 0x232A ) && ( closingcp == 
0x3009 ) ) ||
 ?? ???? ??? ?( ( bracketData.bracket == 0x3009 ) && ( closingcp == 
0x232A ) ) )

Note the logical OR's there. If condition_a OR condition_b OR 
condition_c then you have a match. That is completely deterministic in 
this case.

--Ken

>
> It feels simpler to work with the NFC or NFD equivalents of the
> candidate opening and closing brackets at both the first and last of
> the quoted steps.

From richard.wordingham at ntlworld.com  Sun Mar 20 15:43:38 2022
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Sun, 20 Mar 2022 20:43:38 +0000
Subject: Fault in Bidi Algorithm at BD16
In-Reply-To: <1040dbb3-ce80-caf1-c31d-b6974bec007a@sonic.net>
References: <20220320175826.5ee0b834@JRWUBU2>
 <1040dbb3-ce80-caf1-c31d-b6974bec007a@sonic.net>
Message-ID: <20220320204338.4915e71f@JRWUBU2>

On Sun, 20 Mar 2022 12:49:19 -0700
Ken Whistler via Unicode  wrote:

> Richard,
> 
> On 3/20/2022 10:58 AM, Richard Wordingham via Unicode wrote:
> > 2.  Compare the closing paired bracket being inspected or its
> >      canonical equivalent to the bracket in the current stack
> > element."
> >
> > It was picked up by line 312 of BidiCharacterTests.txt:
> >
> > 0061 0020 2329 0062 002E 0031 3009;1;1;2 2 2 2 2 2 2;0 1 2 3 4 5 6
> >
> > This line primarily checks that U+2329 and U+3009 are identified as
> > a 'bracket pair'.  bpb(U+2329) is U+232A, whose canonical
> > decomposition is U+3009.  However, the step*numbered*  '2' is
> > non-determistic; it contains the word 'or'.  
> 
> I'm not seeing it. The inclusion of an "or" there does not make this 
> non-deterministic.

"Do A or B" is not deterministic.  In general, there may be several
different ways of achieving the same effect.

> Yes, the text is not pedantically precise, I suppose, but most people 
> have not had trouble interpreting what is intended. If your candidate 
> closing bracket (or the canonical equivalent of your candidate
> closing bracket) matches the closing bracket match mapping detailed
> in BidiBrackets.txt for the opening bracket candidate on the stack,
> then you have a bracket match.

How do you collect the statistics?  I would have thought you would have
been unlikely to know about such matters, for the errors should get
caught by the conformance tests.  At that point the penny drops.  And
with English, one needs to be careful with quantifiers like 'or'; it
seems clear to me that not even all native speakers interpret
combinations the same.

By the time one gets to N0, the intelligibility of the UBA is rapidly
falling off.  (I'm not confident that that's curable.)  And we know that
people do code up Unicode algorithms without understanding them.  The
UBA is one of the more complex algorithms, which is probably why it has
such a large set of tests.  The complexity has led to at least one
author leaving a curse in his public code.

> This affects precisely 2329 and 232A because those are the *only* 
> brackets listed in BidiBrackets.txt that have canonical decomposition 
> mappings. And it is vanishingly unlikely that the UTC is ever going
> to add more such paired brackets with canonical decomposition
> mappings.
> 
> >   The simple, robust solution is to change 'or
> > its canonical equivalent' to 'and its canonical equivalents'.  
> I don't think that actually would clarify the text. And we shouldn't 
> imply more o f a requirement to import normalization into UBA than is 
> actually needed.
> >   That
> > also avoids the risk of 'its canonical equivalent' being
> > interpreted as the result of the function to_NFC or to_NFD.  
> 
> I don't see the distinction here. The NFC *and* NFD form of 2329 are 
> both 3008. The NFC *and* NFD form of 232A are both 3009. You could
> use either of those and still end up with the right result for the
> bracket match. But why bother?

U+232A is canonically equivalent to U+3009, but is neither
to_NFC(U+3009) nor to_NFD(U+3009).  Thus, it's not immediately obvious
that the 'canonical equivalent of U+3009' means U+232A.

> The BidiReference code just does a hard-coded additional test (and 
> explains why). For this particular edge case, that works just as
> well, is just as robust (see above assertion that UTC isn't going to
> add more exceptions to be dealt with), and would be *faster* than
> introducing a step to normalize the brackets:
> 
>  ?? ???? if ( ( bracketData.bracket == closingcp ) ||
>  ?? ???? ??? ?( ( bracketData.bracket == 0x232A ) && ( closingcp == 
> 0x3009 ) ) ||
>  ?? ???? ??? ?( ( bracketData.bracket == 0x3009 ) && ( closingcp == 
> 0x232A ) ) )
> 
> Note the logical OR's there. If condition_a OR condition_b OR 
> condition_c then you have a match. That is completely deterministic
> in this case.

The reference code is now in a place widely consider a threat to
networks!

Richard.


From public at khwilliamson.com  Wed Mar 23 11:01:24 2022
From: public at khwilliamson.com (Karl Williamson)
Date: Wed, 23 Mar 2022 10:01:24 -0600
Subject: Unicode in the news
Message-ID: <7d91cdec-eae4-6fd0-933a-8df5ebaa6b45@khwilliamson.com>

https://www.cbc.ca/news/canada/british-columbia/dakelh-indigenous-language-standard-syllabics-1.6392552

From sosipiuk at gmail.com  Thu Mar 24 13:09:23 2022
From: sosipiuk at gmail.com (=?iso-8859-2?Q?S=B3awomir_Osipiuk?=)
Date: Thu, 24 Mar 2022 14:09:23 -0400
Subject: Use of CANCEL TAG in emoji flags
Message-ID: <005e01d83faa$4b4da6c0$e1e8f440$@gmail.com>

Alexei Chimendez submitted a report last year about the problematic use of
CANCEL TAG for flag emojis:
https://www.unicode.org/L2/L2021/21127-edcom-rept-utc168.html

This was turned into an action item for Markus Scherer and the Properties
and Algorithms Group:
https://www.unicode.org/L2/L2021/21123.htm#168-A30

Is there any further information about this issue or the progress on it?

Thanks,
S?awomir Osipiuk



From markus.icu at gmail.com  Thu Mar 24 13:33:58 2022
From: markus.icu at gmail.com (Markus Scherer)
Date: Thu, 24 Mar 2022 11:33:58 -0700
Subject: Use of CANCEL TAG in emoji flags
In-Reply-To: <005e01d83faa$4b4da6c0$e1e8f440$@gmail.com>
References: <005e01d83faa$4b4da6c0$e1e8f440$@gmail.com>
Message-ID: 

On Thu, Mar 24, 2022 at 11:13 AM S?awomir Osipiuk via Unicode <
unicode at corp.unicode.org> wrote:

> Alexei Chimendez submitted a report last year about the problematic use of
> CANCEL TAG for flag emojis:
> https://www.unicode.org/L2/L2021/21127-edcom-rept-utc168.html
>
> This was turned into an action item for Markus Scherer and the Properties
> and Algorithms Group:
> https://www.unicode.org/L2/L2021/21123.htm#168-A30
>
> Is there any further information about this issue or the progress on it?
>

Thanks for the nudge :-}
I have added it to the agenda now...

markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From don.hosek at gmail.com  Wed Mar 30 22:16:23 2022
From: don.hosek at gmail.com (Don Hosek)
Date: Wed, 30 Mar 2022 21:16:23 -0600
Subject: =?utf-8?Q?Clarification_on_Annex_29=2C_GB12=E2=80=9313?=
Message-ID: 

Annex 29 says:
> Do not break within emoji flag sequences. That is, do not break between regional indicator (RI) symbols if there is an odd number of RI characters before the break point.
> GB12	sot (RI RI)* RI	?	RI
> GB13	[^RI] (RI RI)* RI	?	RI

This would seem to indicate that any even number of RI tags should be treated as a single grapheme so given, e.g., ?????? this should be a single grapheme rather than the expected three. There is no test in https://www.unicode.org/Public/14.0.0/ucd/auxiliary/GraphemeBreakTest.txt that would enforce this however. Or is this just a case of my misreading the spec and there is an implicit ? after each pair of RI characters? (if the latter, it might be helpful for future implementors to have a note to that effect).

-dh