From monicamerchant1 at gmail.com Sat Feb 5 00:28:27 2022 From: monicamerchant1 at gmail.com (Monica Merchant) Date: Sat, 5 Feb 2022 19:28:27 +1300 Subject: Normalizer tool by Richard Ishida Message-ID: Hello, Where might I find Richard Ishida's normalizer tool and source code? The links in [this post](https://r12a.github.io/blog/200901.html) no longer work. Thank you, mmerc -------------- next part -------------- An HTML attachment was scrubbed... URL: From abrahamgross at disroot.org Sat Feb 5 13:51:43 2022 From: abrahamgross at disroot.org (ag disroot) Date: Sat, 5 Feb 2022 19:51:43 +0000 (UTC) Subject: Normalizer tool by Richard Ishida In-Reply-To: References: Message-ID: <6fbd6bdc-0b00-4e61-97a1-761e045ee980@disroot.org> https://r12a.github.io/uniview/ https://github.com/r12a/uniview -------------- next part -------------- An HTML attachment was scrubbed... URL: From ishida at w3.org Mon Feb 7 07:20:11 2022 From: ishida at w3.org (r12a) Date: Mon, 7 Feb 2022 13:20:11 +0000 Subject: Normalizer tool by Richard Ishida In-Reply-To: References: Message-ID: <985bce2d-1f3f-3be7-440d-be59521efb36@w3.org> I no longer maintain the JavaScript normalisation tool i wrote, since JavaScript now provides the normalize() function, and i use that. See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize hth ri Fwiw, i also went through all the blog posts and changed rishida.net links to point to r12a.github.io.? I no longer own or have anything to do with the rishida.net domain name, despite the fact that someone has posted internationalisation-related content to it. Monica Merchant via Unicode wrote on 05/02/2022 06:28: > Hello, > > Where might I find Richard Ishida's normalizer tool and source code? > The links in [this post](https://r12a.github.io/blog/200901.html) no > longer work. > > > Thank you, > > mmerc -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Thu Feb 10 06:45:47 2022 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Thu, 10 Feb 2022 12:45:47 +0000 (GMT) Subject: A multilingual sign that includes a language-independent glyph and a QR code Message-ID: <3444560a.4fd2.17ee3ab16ea.Webtop.96@btinternet.com> A multilingual sign that includes a language-independent glyph and a QR code https://forum.affinity.serif.com/index.php?/topic/157030-thank-you-for-visiting/ William Overington Thursday 10 February 2022 From abrahamgross at disroot.org Thu Feb 10 09:29:20 2022 From: abrahamgross at disroot.org (ag disroot) Date: Thu, 10 Feb 2022 15:29:20 +0000 (UTC) Subject: A multilingual sign that includes a language-independent glyph and a QR code In-Reply-To: <3444560a.4fd2.17ee3ab16ea.Webtop.96@btinternet.com> References: <3444560a.4fd2.17ee3ab16ea.Webtop.96@btinternet.com> Message-ID: <84c33a74-6145-46b7-88d5-c84a64eb0f0a@disroot.org> You keep posting your "language-independent glyphs" here, but how is it language independant if no one understands what it means? In that case logographies like Chinese hanzi and Egyptian heiroglyphs are just as language independent (at least the pictographs (??), ideographs (??) and compound ideographs (??)) because its symbols of real things so no language necessary. Hanzi is at least legible by ~1.5 billion people, and already has most ideas encoded in characters with a very easy way to extend it. (if it sounds like I'm upset then I'm sorry, that wasn't the intention. just curious what your reasoning is) From wjgo_10009 at btinternet.com Thu Feb 10 12:07:03 2022 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Thu, 10 Feb 2022 18:07:03 +0000 (GMT) Subject: A multilingual sign that includes a language-independent glyph and a QR code In-Reply-To: <84c33a74-6145-46b7-88d5-c84a64eb0f0a@disroot.org> References: <3444560a.4fd2.17ee3ab16ea.Webtop.96@btinternet.com> <84c33a74-6145-46b7-88d5-c84a64eb0f0a@disroot.org> Message-ID: <3ae4d3a3.5c46.17ee4d137e8.Webtop.96@btinternet.com> Hi > You keep posting your "language-independent glyphs" here, but how is > it language independant if no one understands what it means? It is language-independent even if nobody other than me knows the meaning that I have assigned to it. As a result of this thread, maybe a few more people will know what it means if they see the glyph again some time. Maybe as a work of art it will result in some people carrying out thought experiments. So the artwork could be a catalyst for progress in some way. Though maybe not. But epsilon of a chance is better than zero of a chance. William Overington Thursday 10 February 2022 -------------- next part -------------- An HTML attachment was scrubbed... URL: From lyratelle at gmx.de Thu Feb 10 16:58:57 2022 From: lyratelle at gmx.de (Dominikus Dittes Scherkl) Date: Thu, 10 Feb 2022 23:58:57 +0100 Subject: A multilingual sign that includes a language-independent glyph and a QR code In-Reply-To: <3ae4d3a3.5c46.17ee4d137e8.Webtop.96@btinternet.com> References: <3444560a.4fd2.17ee3ab16ea.Webtop.96@btinternet.com> <84c33a74-6145-46b7-88d5-c84a64eb0f0a@disroot.org> <3ae4d3a3.5c46.17ee4d137e8.Webtop.96@btinternet.com> Message-ID: <5b974265-9b2e-3d58-2174-29247ea5ce95@gmx.de> Am 10.02.22 um 19:07 schrieb William_J_G Overington via Unicode: > Hi > > > > You keep posting your "language-independent glyphs" here, but how is > it language independant if no one understands what it means? > > > It is language-independent even if nobody other than me knows the > meaning that I have assigned to it. No, that's not language independance. Its just a new language (with the additional disadvantage that nobody knows it) -- Dominikus Dittes Scherkl From johannes at bergerhausen.com Fri Feb 11 03:41:50 2022 From: johannes at bergerhausen.com (Johannes Bergerhausen) Date: Fri, 11 Feb 2022 10:41:50 +0100 Subject: update WWS website Message-ID: <10C2419B-570A-4F1D-B752-5F3C5549FBD3@bergerhausen.com> Dear list, fyi: we have updated the worldswritingsystems.org website to Unicode 14.0. Besides some corrections, there are also some new typographic reference glpyhs and a new FAQ page. If you spot a mistake, please send us a correction. By our count, there are currently 294 known scripts, living or historical. 131 of them are not yet encoded in Unicode. Many greetings, Johannes (Hochschule Mainz, Germany), Deborah (SEI Berkeley, USA), Thomas (ANRT Nancy, France) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jk at koremail.com Fri Feb 11 04:30:23 2022 From: jk at koremail.com (jk at koremail.com) Date: Fri, 11 Feb 2022 18:30:23 +0800 Subject: update WWS website In-Reply-To: <10C2419B-570A-4F1D-B752-5F3C5549FBD3@bergerhausen.com> References: <10C2419B-570A-4F1D-B752-5F3C5549FBD3@bergerhausen.com> Message-ID: <9ce3e3849e22c04ea1dc6c45b8eda455@koremail.com> The list seems to be rather inaccurate in places. It says for example that the Zhuang Square script has not been encoded. However whilst there are still characters to be added thousands of Zhuang square characters have been encoded. Nor for that matter is it accurate to describe it as historic. Warm regards John Knightley On 2022-02-11 17:41, Johannes Bergerhausen via Unicode wrote: > Dear list, > > fyi: we have updated the worldswritingsystems.org [1] website to > Unicode 14.0. Besides some corrections, there are also some new > typographic reference glpyhs and a new FAQ page. If you spot a > mistake, please send us a correction. > > By our count, there are currently 294 known scripts, living or > historical. 131 of them are not yet encoded in Unicode. > > Many greetings, > Johannes (Hochschule Mainz, Germany), Deborah (SEI Berkeley, USA), > Thomas (ANRT Nancy, France) > > Links: > ------ > [1] http://worldswritingsystems.org From wjgo_10009 at btinternet.com Mon Feb 14 08:59:48 2022 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Mon, 14 Feb 2022 14:59:48 +0000 (GMT) Subject: Recording accurately a person's name Message-ID: <115916f1.b3e8.17ef8bf3958.Webtop.96@btinternet.com> There was recently a Public Review. 434 CLDR Person Name Formatting I sent in a response. My response and the result of reviewing by the subcommittee is available as follows. https://unicode-org.atlassian.net/browse/CLDR-15263 However, it appears, from the response, that many of the issues that I mentioned are for implementers of software that use the standard. The issue of some (though not all) people and organizations deciding to only use the first two initials of someone's given names, so, for example, with a name with three initials before the surname deciding to only use the first two when typing a letter from a longhand draft or replying to a letter goes back to before the widespread use of computers that exists today. So, I write here, to a mailing list that is read by many people who implement software systems that include Unicode in some way, to ask please that when it comes to designing software that the widespread concept of only allowing for one "middle initial" is discontinued so that people with more than two given names are listed according to their name and not by some edited version of it that may, in fact, be the name of another person. It seems to me that an application program needs a field that will accept more than one letter. Also, when producing an address label, or an insurance certificate, or whatever, to not assume or action that only the first character of the given2 field is needed to be printed. Also, a related issue, please allow for Name on Card for credit card and debit card transactions to be entered manually rather than deducing it from name data and presenting it in a "greyed-out cannot be altered" field, because Name on Card may or may not have a honorific and may have a combination of names in full and initials that is not congruently deducible from the data. With this new standard being produced, the opportunity to get away from the widespread name truncation practice exists, please take the opportunity to do so. Thank you. William J. G. Overington Monday 14 February 2022 From steffen at sdaoden.eu Mon Feb 14 11:08:19 2022 From: steffen at sdaoden.eu (Steffen Nurpmeso) Date: Mon, 14 Feb 2022 18:08:19 +0100 Subject: Recording accurately a person's name In-Reply-To: <115916f1.b3e8.17ef8bf3958.Webtop.96@btinternet.com> References: <115916f1.b3e8.17ef8bf3958.Webtop.96@btinternet.com> Message-ID: <20220214170819.Mtm8a%steffen@sdaoden.eu> William_J_G Overington via Unicode wrote in <115916f1.b3e8.17ef8bf3958.Webtop.96 at btinternet.com>: |There was recently a Public Review. | |434 CLDR Person Name Formatting While totally off-topic i see CLDR and long wanted to report that _all_ messages of the German CLDR forum were classified as spam by GMail (including those of their own fellows). I did not have one in my regular mail folder. (All the ones i later reviewed in my spam folder where in english, just to mention it.) I mean maybe it is fun, as WWF mails, and things like comp.lang.awk at googlegroups digests and such is spam, too. Hm. --steffen | |Der Kragenbaer, The moon bear, |der holt sich munter he cheerfully and one by one |einen nach dem anderen runter wa.ks himself off |(By Robert Gernhardt) From monicamerchant1 at gmail.com Thu Feb 17 06:18:56 2022 From: monicamerchant1 at gmail.com (Monica Merchant) Date: Fri, 18 Feb 2022 01:18:56 +1300 Subject: Compatibility decomposables that are not compatibility characters Message-ID: Hello, I have a question about the last two examples on the bottom of page 27 of Chapter 2.3 Compatibility Characters : *Example 1* By way of contrast, some compatibility decomposable characters, such as > modifier letters > used in phonetic orthographies, for example, U+02B0 modifier letter small > h, are not > considered to be compatibility characters. They would have been accepted > for encoding in > the standard on their own merits, regardless of their need for mapping to > IPA. A large > number of compatibility decomposable characters like this are actually > distinct symbols > used in specialized notations, whether phonetic or mathematical. In such > cases, their compatibility > mappings express their historical derivation from styled forms of standard > letters. *Example 2* Other compatibility decomposable characters are widely used characters > serving essential > functions. U+00A0 no-break space is one example. In these and similar > cases, such as > fixed-width space characters, the compatibility decompositions define > possible fallback > representations. The first example illustrates the case where a *compatibility decomposable character* is *not* a *compatibility character* (i.e. a character that would not have been encoded except for round-tripping with a source standard): The Spacing Modifier Letters (U+02B0-U+02FF) and Mathematical Alphanumeric Symbols (U+1D400-U+1D7FF) are not compatibility characters because, although they resemble rich text variants of ordinary letters, they are actually distinct symbols and therefore would have been accepted for encoding on their own merits (as opposed to being encoded solely for round-tripping). However, I'm confused by the second example. In particular, I'm not sure if no-break space (*U+00A0*) and the fixed-width space characters (*U+2000-U+200A*) are compatibility characters or not. They are described as "serving essential functions", which I read as meaning that they would have been encoded even if it weren't for round-tripping, in which case they would not be considered as compatibility characters. Is this correct? If so, are they essential because they facilitate the typesetting of text-based markup like HTML (where formatting must be specified in plain text)? No-break space is also essential in that it is used to display standalone non-spacing marks (pg 267 ). I apologise if this is an obvious question and would be grateful for any guidance, as most resources only mention compatibility characters in passing. Thank you, Monica -------------- next part -------------- An HTML attachment was scrubbed... URL: From cate at cateee.net Thu Feb 17 07:52:32 2022 From: cate at cateee.net (Giacomo Catenazzi) Date: Thu, 17 Feb 2022 14:52:32 +0100 Subject: Compatibility decomposables that are not compatibility characters In-Reply-To: References: Message-ID: Hello Monica, On 17.02.2022 13:18, Monica Merchant via Unicode wrote: > However, I'm confused by the?second example. In particular, I'm not sure > if no-break space (*U+00A0*)?and the fixed-width space characters > (*U+2000-U+200A*)?are compatibility characters or not. They are > described as "serving essential functions", which I read as meaning that > they would have been encoded even if it weren't for round-tripping,?in > which case they would not be considered as compatibility?characters. Is > this correct? If so, are they essential because they?facilitate the > typesetting of text-based markup like HTML (where formatting must be > specified in plain text)? No-break space is also essential in that it is > used to display standalone non-spacing marks (pg 267 > ). > I read the section in this manner: the three examples before your example 1 and example 2 describe the case of compatibility characters that are not compatibility decomposable characters. Then the standard describe two examples where we have compatibility decomposition, but without being compatibility characters. Note that on page 26 we have: vvvv There is no formal listing of all compatibility characters in the Unicode Standard. This follows from the nature of the definition of compatibility characters. It is a judgement call as to whether any particular character would have been accepted for encoding if it had not been required for interoperability with a particular standard. Different participants in character encoding often disagree about the appropriateness of encoding particular characters, and sometimes there are multiple justifications for encoding a given character. ^^^^ So it depends on how do you interpret U+00A0. As you write, you may consider essential distinction in HTML, so it may not be a compatibility character. On the other hand, a typesetter may interpret U+00A0 as U+0020. Such person will decide to break or not the space according the context (he know language rules and style, e.g. not to break number with units, "Ms." with the name, etc.). So the context, but not the character makes the distinction. But your extra cases are more interesting. U+2000 is canonical equivalent to U+2002 (EN QUAD vs EN SPACE). These not just have a compatibility decomposable character, but in my opinion they are also just compatibility characters: there are exactly the same character (there are included just because an error/wrong interpretation of existing documents). The same for U+2001. I would consider U+2002 to U+200A without U+2007 also as compatibility characters (and Unicode Database considers them as compatibility decomposable characters). Probably Unicode do the same, because they have the type "". It is just U+2007 (not just because like U+00A0 has a instead of ) that make me think. For me, this is just a decimal digit zero which it is not printed, so it has own merits: it is not a separation, but a meaningful character. (context: tables). Different people may have different opinions. giacomo > > > Thank you, > > Monica > > From asmusf at ix.netcom.com Thu Feb 17 13:33:22 2022 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Thu, 17 Feb 2022 11:33:22 -0800 Subject: Compatibility decomposables that are not compatibility characters In-Reply-To: References: Message-ID: <748b4a9e-3cb3-66fc-a334-7e43a32fb662@ix.netcom.com> An HTML attachment was scrubbed... URL: From kenwhistler at sonic.net Thu Feb 17 19:32:57 2022 From: kenwhistler at sonic.net (Ken Whistler) Date: Thu, 17 Feb 2022 17:32:57 -0800 Subject: Compatibility decomposables that are not compatibility characters In-Reply-To: References: Message-ID: <703ad57b-3d7f-4b56-8221-a1c8876ad061@sonic.net> In general, it is a good idea not to try to parse the discussion of compatibility characters too closely. That whole section of the core specification was written to help clarify the ambiguous, careless way that people were tending to wave around the term "compatibility character" in earlier days of the standard. It is unfortunate that we ended up with the term "compatibility" used for a specific set of decomposition types baked into the data files and as a normative part of the Unicode Normalization Algorithm, but there we are. It just means that people need to be careful now when they evoke the *other* sense of "compatibility character" -- the shorthand usage for which is approximately "useless dreck we didn't really want to include in the standard but had to for one reason or another." That second use overlaps a lot with characters that formally have "compatibility decompositions", but the two sets are not the same -- hence the need for the explanation. On 2/17/2022 5:52 AM, Giacomo Catenazzi via Unicode wrote: > So it depends on how do you interpret U+00A0. As you write, you may > consider essential distinction in HTML, so it may not be a > compatibility character. On the other hand, a typesetter may interpret > U+00A0 as U+0020. Such person will decide to break or not the space > according the context (he know language rules and style, e.g. not to > break number with units, "Ms." with the name, etc.). So the context, > but not the character makes the distinction. U+00A0 is a widely used, clearly necessary character. If it hadn't already been in significant character sets incorporated into the earliest drafts of the Unicode repertoire, the Unicode architects almost certainly would have invented it and added it in. Now, from a certain point of view, characters added to Unicode 1.0 because they were already encoded in ISO 8859-1 ("Latin-1") were added "for compatibility" with that earlier character set. That seems pretty obvious, because, for good reasons, U+0010..U+00FF were all added to Unicode in the exact same order and code values as for Latin-1. You don't get much more compatible than that! But at the time, nobody was really arguing that those were compatibility characters. It was assumed that we had to have all the Latin-1 characters in the standard. That was considered a no brainer at the time. None were "useless dreck". In fact, the big argument then was about the accented Latin letters in the range U+00C0..U+00FF, which ended up with *canonical* decompositions into their base letter + accent combinations. So those were canonical decomposibles, and not compatibility decomposibles, although quite arguably, they were encoded "for compatibility" with Latin-1. See how slippery this gets? By contrast, the archetypal examples at the time of "useless dreck" that were added as "compatibility characters" were the various ligatures in the Arabic Presentation Forms-A block and the Alphabetic Presentation Forms block. Those were all considered "compatibility characters" at the time, and were even quarantined in a range then known as the "Compatibility Area" in the code space. > > But your extra cases are more interesting. > U+2000 is canonical equivalent to U+2002 (EN QUAD vs EN SPACE). These > not just have a compatibility decomposable character, but in my > opinion they are also just compatibility characters: there are exactly > the same character (there are included just because an error/wrong > interpretation of existing documents). The same for U+2001. > > I would consider U+2002 to U+200A without U+2007 also as compatibility > characters (and Unicode Database considers them as compatibility > decomposable characters). Probably Unicode do the same, because they > have the type "". > > It is just U+2007 (not just because like U+00A0 has a > instead of ) that make me think. For me, this is just a > decimal digit zero which it is not printed, so it has own merits: it > is not a separation, but a meaningful character. (context: tables). > Different people may have different opinions. The fixed-width spaces in the 2000 block of punctuation have their own interesting history. The fact that they were added in Unicode 1.0 means that they were not part of the forced merger with 10646 repertoire in 1992 that led to the Arabic ligatures and the like. Instead, they derived largely from the pre-existing XCCS (Xerox) character set, but some of them appeared also in other early character sets. In Unicode 1.0 they had no decompositions -- nothing did. The decompositions were first added in Unicode 1.1, and at that point they were all tagged as " [0020]". That was the beginning of the realization that most of the fixed-width space characters didn't really belong in plain text for interchange, but instead were artifacts of printing technology. The addition of the *canonical* decompositions for 2000 and 2001 was a Unicode 2.0 innovation, when it became clear that nobody could come up with a convincing distinction between an "EM QUAD" as a space character and an "EM SPACE" as a space character. Nowadays most people would agree that there would be little reason to put any of those other than 200B ZWSP and 2007 FIGURE SPACE into a plain text stream. The rest of the fixed width space characters are basically "useless dreck", but the interesting distinction here is that they didn't start out being considered to be compatibility characters, but rather graduated to that status as people came to appreciate the fact that there weren't valid reasons to use them in modern Unicode text representation. They aren't bad enough to be formally deprecated, but they live in a kind of limbo of useless stuff you'd be better off without, along with scads of other such artifacts in the standard. --Ken > > From hubaishan at outlook.sa Thu Feb 17 22:44:17 2022 From: hubaishan at outlook.sa (Saeed Hubaishan) Date: Fri, 18 Feb 2022 04:44:17 +0000 Subject: Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4) Message-ID: Hi, "The Decomposition Type Mapping" of these ligature marks are worng: FC5E ??? Arabic Ligature Shadda With Dammatan Isolated Form ? 0020 ? 064C ?? 0651 ?? FC5F ??? Arabic Ligature Shadda With Kasratan Isolated Form ? 0020 ? 064D ?? 0651 ?? FC60 ??? Arabic Ligature Shadda With Fatha Isolated Form ? 0020 ? 064E ?? 0651 ?? FC61 ??? Arabic Ligature Shadda With Damma Isolated Form ? 0020 ? 064F ?? 0651 ?? FC62 ??? Arabic Ligature Shadda With Kasra Isolated Form ? 0020 ? 0650 ?? 0651 ?? FCF2 ??? Arabic Ligature Shadda With Fatha Medial Form ? 0640 ??? 064E ?? 0651 ?? FCF3 ??? Arabic Ligature Shadda With Damma Medial Form ? 0640 ??? 064F ?? 0651 ?? FCF4 ??? Arabic Ligature Shadda With Kasra Medial Form ? 0640 ??? 0650 ?? 0651 ?? Arabic Shadda must be before the marks (064C ?? ,064D ?? , 064E ?? , 064F ?? , 0650 ??) -------------- next part -------------- An HTML attachment was scrubbed... URL: From wjgo_10009 at btinternet.com Fri Feb 18 05:36:33 2022 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Fri, 18 Feb 2022 11:36:33 +0000 (GMT) Subject: Simulating the handsetting of metal type (from Re: Compatibility decomposables that are not compatibility characters) In-Reply-To: <703ad57b-3d7f-4b56-8221-a1c8876ad061@sonic.net> References: <703ad57b-3d7f-4b56-8221-a1c8876ad061@sonic.net> Message-ID: <18640b1.26e2.17f0c9e946e.Webtop.96@btinternet.com> https://forum.affinity.serif.com/index.php?/topic/157455-simulating-the-handsetting-of-metal-type/ William Overington Friday 18 February 2022 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sosipiuk at gmail.com Fri Feb 18 11:38:48 2022 From: sosipiuk at gmail.com (=?UTF-8?Q?S=C5=82awomir_Osipiuk?=) Date: Fri, 18 Feb 2022 12:38:48 -0500 Subject: Compatibility decomposables that are not compatibility characters In-Reply-To: <703ad57b-3d7f-4b56-8221-a1c8876ad061@sonic.net> References: <703ad57b-3d7f-4b56-8221-a1c8876ad061@sonic.net> Message-ID: On Thu, Feb 17, 2022 at 8:36 PM Ken Whistler via Unicode wrote: > > The addition of the *canonical* decompositions for 2000 and 2001 was a > Unicode 2.0 innovation, when it became clear that nobody could come up > with a convincing distinction between an "EM QUAD" as a space character > and an "EM SPACE" as a space character. While following a different trail a couple of weeks ago I came upon this proposal: http://www.unicode.org/L2/L2019/19115-fwsp-usability.pdf While the proposal itself is a non-starter due to stability reqs, Marcel Schneider makes the case that the QUADs were originally meant to allow line breaking, while the adjacent SPACE characters should have been non-breaking. That would have been the "convincing distinction", if it had been implemented that way. S?awomir Osipiuk From kenwhistler at sonic.net Fri Feb 18 13:44:09 2022 From: kenwhistler at sonic.net (Ken Whistler) Date: Fri, 18 Feb 2022 11:44:09 -0800 Subject: Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4) In-Reply-To: References: Message-ID: On 2/17/2022 8:44 PM, Saeed Hubaishan via Unicode wrote: > Hi, > "The Decomposition Type Mapping"? of these ligature marks are worng: > |FC5E| ???? Arabic Ligature Shadda With Dammatan Isolated Form > ? |0020| ? |064C|??? |0651|??? > |FC5F| ???? Arabic Ligature Shadda With Kasratan Isolated Form > ? |0020| ? |064D|??? |0651|??? > > ... > Arabic Shadda must be before the marks (||064C|??? ,|064D|??? , > |064E|???| ,|064F|??? ||, |0650|???) > Decompositions are immutable, constrained by normalization stability. To see how such rendering should be handled, instead, please see Unicode Technical Report #53, Unicode Arabic Mark Rendering, which addresses the issue of the placement of shadda, along with many other issues of ordering and placement of various tashkil, ijam, and other marks: https://www.unicode.org/reports/tr53/ --Ken -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at kli.org Fri Feb 18 13:46:13 2022 From: mark at kli.org (Mark E. Shoulson) Date: Fri, 18 Feb 2022 14:46:13 -0500 Subject: Kirai Rat Decompositions, was Re: Compatibility decomposables that are not compatibility characters In-Reply-To: References: <703ad57b-3d7f-4b56-8221-a1c8876ad061@sonic.net> Message-ID: <34c1a873-25c2-8c66-6f59-278d95341eb1@shoulson.com> Perhaps relevant to this thread, I was just reading in https://www.unicode.org/L2/L2022/22043-kirat-rai.pdf L2/22-043, proposal to encode Kirai Rat Script, where it remarks regarding the vowels: > These should all be encoded atomically. This is because linguistically > these vowels are not composed of two separatecharacters, they are > single vowels in their own right. It is true that the custom encoded > Kirat Rai font uses decomposedvowel signs as a matter of expediency, > but this decision should not influence the right way to encode the > script.Because the glyph for some of the vowels (aa and e) are part of > the shape of the last 3 vowels (ai, o, au) there shouldbe canonical > decompositions for the last 3 vowels. With these decompositions, Do > Not Use tables are not necessary. If the vowels are to be encoded atomically, and it sounds like they should be, shouldn't we *not* want to have canonical decompositions for them?? I thought Unicode was trying to avoid precomposed characters at this point.? I guess it's too late to hope for "only one right way to spell it" out of Unicode, but is that still something we try to approach?? It almost seems to me that canonical decompositions also stem from cases of "things that wouldn't be encoded if they were proposed now," and if so it would not really make sense to propose anything with a canonical decomposition.? Or am I misunderstanding the attitude towards canonical decompositions, or the proposal's statement? ~mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Fri Feb 18 13:48:28 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 18 Feb 2022 19:48:28 +0000 Subject: Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4) In-Reply-To: References: Message-ID: <20220218194828.26235c8d@JRWUBU2> On Fri, 18 Feb 2022 04:44:17 +0000 Saeed Hubaishan via Unicode wrote: > Hi, > "The Decomposition Type Mapping" of these ligature marks are worng: > FC5E ??? Arabic Ligature Shadda With Dammatan Isolated Form > ? 0020 ? 064C ?? 0651 ?? > FC5F ??? Arabic Ligature Shadda With Kasratan Isolated Form > ? 0020 ? 064D ?? 0651 ?? > FC60 ??? Arabic Ligature Shadda With Fatha Isolated Form > ? 0020 ? 064E ?? 0651 ?? > FC61 ??? Arabic Ligature Shadda With Damma Isolated Form > ? 0020 ? 064F ?? 0651 ?? > FC62 ??? Arabic Ligature Shadda With Kasra Isolated Form > ? 0020 ? 0650 ?? 0651 ?? > > FCF2 ??? Arabic Ligature Shadda With Fatha Medial Form > ? 0640 ??? 064E ?? 0651 ?? > FCF3 ??? Arabic Ligature Shadda With Damma Medial Form > ? 0640 ??? 064F ?? 0651 ?? > FCF4 ??? Arabic Ligature Shadda With Kasra Medial Form > ? 0640 ??? 0650 ?? 0651 ?? > Arabic Shadda must be before the marks (064C ?? ,064D ?? , 064E ?? , > 064F ?? , 0650 ??) But they and shadda have different non-zero canonical combining classes (ccc), so their order shall intend no difference. Shadda has the higher ccc, so it comes last. Putting it last makes the decomposition table easier to use for conversion to form NFKD. Richard. From doug at ewellic.org Fri Feb 18 15:37:50 2022 From: doug at ewellic.org (Doug Ewell) Date: Fri, 18 Feb 2022 14:37:50 -0700 Subject: Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4) In-Reply-To: References: Message-ID: <007701d8250f$c774fe20$565efa60$@ewellic.org> Saeed Hubaishan wrote: > "The Decomposition Type Mapping" of these ligature marks are worng: Comments like these always make me wonder what motivated them. The vast majority of characters in the Arabic Presentation Forms-A and -B blocks should not be used. They exist for compatibility with older platforms that did not implement proper Arabic shaping and directionality. Instead, use normal Arabic letters from the regular Arabic, Arabic Supplement, Extended-A, or Extended-B blocks. -- Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org From richard.wordingham at ntlworld.com Fri Feb 18 16:06:40 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 18 Feb 2022 22:06:40 +0000 Subject: Kirai Rat Decompositions, was Re: Compatibility decomposables that are not compatibility characters In-Reply-To: <34c1a873-25c2-8c66-6f59-278d95341eb1@shoulson.com> References: <703ad57b-3d7f-4b56-8221-a1c8876ad061@sonic.net> <34c1a873-25c2-8c66-6f59-278d95341eb1@shoulson.com> Message-ID: <20220218220640.3dd190dc@JRWUBU2> On Fri, 18 Feb 2022 14:46:13 -0500 "Mark E. Shoulson via Unicode" wrote: > Perhaps relevant to this thread, I was just reading in > https://www.unicode.org/L2/L2022/22043-kirat-rai.pdf L2/22-043, > proposal to encode Kirai Rat Script, where it remarks regarding the > vowels: > > > These should all be encoded atomically. This is because > > linguistically these vowels are not composed of two > > separatecharacters, they are single vowels in their own right. It > > is true that the custom encoded Kirat Rai font uses decomposedvowel > > signs as a matter of expediency, but this decision should not > > influence the right way to encode the script.Because the glyph for > > some of the vowels (aa and e) are part of the shape of the last 3 > > vowels (ai, o, au) there shouldbe canonical decompositions for the > > last 3 vowels. With these decompositions, Do Not Use tables are not > > necessary. > If the vowels are to be encoded atomically, and it sounds like they > should be, shouldn't we *not* want to have canonical decompositions > for them?? I thought Unicode was trying to avoid precomposed > characters at this point.? I guess it's too late to hope for "only > one right way to spell it" out of Unicode, but is that still > something we try to approach?? It almost seems to me that canonical > decompositions also stem from cases of "things that wouldn't be > encoded if they were proposed now," and if so it would not really > make sense to propose anything with a canonical decomposition.? Or am > I misunderstanding the attitude towards canonical decompositions, or > the proposal's statement? X technology should obviously be opposed wherever possible. We should make it impossible to enter these vowel symbols at a a single stroke when using a simple X keyboard or even an MSKLC keyboard creator. We must keep professional keyboard writers in work. Your wording is confusing. There are several different options: 1) Only allow encoding for single vowels (the Khmer model) 2) Do not encode visually compound vowels (the Myanmar model) 3) Allow visually compound vowels as sequences or as single characters (the south Indian model) The proposal argues for (3), which rather assumes that canonical equivalence will be taken seriously. At least we don't have the problem presented by doubled multipart south Indian vowels. Model (1) calls forth a need for stop lists, and potential confusion when a compound vowel notation is later found to be needed. (From the Southern Thai point of view, there seems to be a vowel missing from the Khmer script which it would be very tempting to just encode as , though in *Khmer* usage it is arguably just a glyph variant of U+17BE KHMER VOWEL SIGN OE.) I think you're calling for (2), which with current technology seems to make keyboard creation unduly complicated or fragile if we want users to be able to treat KIRAT RAI VOWEL SIGN O as a single entity. (Do users have such a perception? We'll probably be told that it's not a user-perceived character.) Richard. From richard.wordingham at ntlworld.com Fri Feb 18 17:24:01 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Fri, 18 Feb 2022 23:24:01 +0000 Subject: Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4) In-Reply-To: <007701d8250f$c774fe20$565efa60$@ewellic.org> References: <007701d8250f$c774fe20$565efa60$@ewellic.org> Message-ID: <20220218232401.4818e7e7@JRWUBU2> On Fri, 18 Feb 2022 14:37:50 -0700 Doug Ewell via Unicode wrote: > The vast majority of characters in the Arabic Presentation Forms-A > and -B blocks should not be used. They exist for compatibility with > older platforms that did not implement proper Arabic shaping and > directionality. Instead, use normal Arabic letters from the regular > Arabic, Arabic Supplement, Extended-A, or Extended-B blocks. Irritatingly, I had to use some of these characters just this week because the shaping in Arabic fonts for basic installations of Windows 10 and Ubuntu didn't include the ligatures we were discussing - in particular that of U+FCCA ARABIC LIGATURE LAM WITH HAH INITIAL FORM. (The ligature was germane to the discussion.) Many of the ligatures are not essential for proper shaping. I've now found and lawfully installed a font that gives me the ligature from normal Arabic letters. Richard. From eliz at gnu.org Sat Feb 19 01:38:22 2022 From: eliz at gnu.org (Eli Zaretskii) Date: Sat, 19 Feb 2022 09:38:22 +0200 Subject: Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4) In-Reply-To: <20220218232401.4818e7e7@JRWUBU2> (message from Richard Wordingham via Unicode on Fri, 18 Feb 2022 23:24:01 +0000) References: <007701d8250f$c774fe20$565efa60$@ewellic.org> <20220218232401.4818e7e7@JRWUBU2> Message-ID: <83sfsfz4pd.fsf@gnu.org> > Date: Fri, 18 Feb 2022 23:24:01 +0000 > From: Richard Wordingham via Unicode > > On Fri, 18 Feb 2022 14:37:50 -0700 > Doug Ewell via Unicode wrote: > > > The vast majority of characters in the Arabic Presentation Forms-A > > and -B blocks should not be used. They exist for compatibility with > > older platforms that did not implement proper Arabic shaping and > > directionality. Instead, use normal Arabic letters from the regular > > Arabic, Arabic Supplement, Extended-A, or Extended-B blocks. > > Irritatingly, I had to use some of these characters just this week > because the shaping in Arabic fonts for basic installations of Windows > 10 and Ubuntu didn't include the ligatures we were discussing - in > particular that of U+FCCA ARABIC LIGATURE LAM WITH HAH INITIAL FORM. > (The ligature was germane to the discussion.) Many of the ligatures are > not essential for proper shaping. I've now found and lawfully installed > a font that gives me the ligature from normal Arabic letters. Which font is that, please? And does anyone here know why the Courier New font on Windows XP does produce the ligature from those two characters, but the same font on Windows 10 doesn't? Is this ligature somehow deemed inappropriate or problematic? I'm not asking about U+FCCA, I'm asking about the display of the two characters U+0644 and U+062D -- should it ligate or shouldn't it? Thanks. From hubaishan at outlook.sa Sat Feb 19 04:20:31 2022 From: hubaishan at outlook.sa (Saeed Hubaishan) Date: Sat, 19 Feb 2022 10:20:31 +0000 Subject: =?utf-8?B?2LHYrzogV3Jvbmcgc2VxdWVuY2UgZm9yIEFyYWJpYyBsaWdhdHVyZSBtYXJr?= =?utf-8?Q?s(FC5E-FC62,_FCF2-FCF4)?= In-Reply-To: <20220218194828.26235c8d@JRWUBU2> References: <20220218194828.26235c8d@JRWUBU2> Message-ID: But we have a problem with some program whom get thier data from unicode like "MediaWiki" and "phpBB" they reorder ??? to ??? with maybe rendered in some old windows fonts like ??? you can try this with wikipedia ________________________________ ??: ??Unicode ???????? ?? Richard Wordingham via Unicode ???? ???????: 18 ??????, 2022 10:48 ? ???: unicode at corp.unicode.org ?????????: Re: Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4) On Fri, 18 Feb 2022 04:44:17 +0000 Saeed Hubaishan via Unicode wrote: > Hi, > "The Decomposition Type Mapping" of these ligature marks are worng: > FC5E ??? Arabic Ligature Shadda With Dammatan Isolated Form > ? 0020 ? 064C ?? 0651 ?? > FC5F ??? Arabic Ligature Shadda With Kasratan Isolated Form > ? 0020 ? 064D ?? 0651 ?? > FC60 ??? Arabic Ligature Shadda With Fatha Isolated Form > ? 0020 ? 064E ?? 0651 ?? > FC61 ??? Arabic Ligature Shadda With Damma Isolated Form > ? 0020 ? 064F ?? 0651 ?? > FC62 ??? Arabic Ligature Shadda With Kasra Isolated Form > ? 0020 ? 0650 ?? 0651 ?? > > FCF2 ??? Arabic Ligature Shadda With Fatha Medial Form > ? 0640 ??? 064E ?? 0651 ?? > FCF3 ??? Arabic Ligature Shadda With Damma Medial Form > ? 0640 ??? 064F ?? 0651 ?? > FCF4 ??? Arabic Ligature Shadda With Kasra Medial Form > ? 0640 ??? 0650 ?? 0651 ?? > Arabic Shadda must be before the marks (064C ?? ,064D ?? , 064E ?? , > 064F ?? , 0650 ??) But they and shadda have different non-zero canonical combining classes (ccc), so their order shall intend no difference. Shadda has the higher ccc, so it comes last. Putting it last makes the decomposition table easier to use for conversion to form NFKD. Richard. -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Sat Feb 19 06:52:37 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sat, 19 Feb 2022 12:52:37 +0000 Subject: =?UTF-8?B?2LHYrzo=?= Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4) In-Reply-To: References: <20220218194828.26235c8d@JRWUBU2> Message-ID: <20220219125237.503aca31@JRWUBU2> On Sat, 19 Feb 2022 10:20:31 +0000 Saeed Hubaishan via Unicode wrote: > But we have a problem with some program whom get thier data from > unicode like "MediaWiki" and "phpBB" they reorder ??? > to > ??? In codepoints, to . No process compliant with Unicode shall *deliberately* render them differently - the sequences are canonically equivalent. > with maybe rendered in some old windows fonts like > ??? > > you can try this with wikipedia This sequence is , which is not canonically normalised. Using the Naskh font Amiri, kasra is by default placed below lam. However, if I enable OpenType feature ss05, which for this font is described (unless the labels have been scrambled) as "Kasra is placed below Shadda instead of base glyph", the kasra is indeed placed immediately below the shadda. Unicode allows both renderings. I'm not sure that Unicode provides any plain text mechanism to distinguish the two renderings. In answer to Eli, the Amiri font is the one I downloaded to get LAM and HAH to automatically ligate; I got it from Ubuntu package fonts-hosny-amiri. The font is published under the SIL Open Font Licence. Richard. From richard.wordingham at ntlworld.com Sat Feb 19 07:05:44 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sat, 19 Feb 2022 13:05:44 +0000 Subject: Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4) In-Reply-To: <83sfsfz4pd.fsf@gnu.org> References: <007701d8250f$c774fe20$565efa60$@ewellic.org> <20220218232401.4818e7e7@JRWUBU2> <83sfsfz4pd.fsf@gnu.org> Message-ID: <20220219130544.0e2d4cb7@JRWUBU2> On Sat, 19 Feb 2022 09:38:22 +0200 Eli Zaretskii via Unicode wrote: > > Date: Fri, 18 Feb 2022 23:24:01 +0000 > > From: Richard Wordingham via Unicode > > Irritatingly, I had to use some of these characters just this week > > because the shaping in Arabic fonts for basic installations of > > Windows 10 and Ubuntu didn't include the ligatures we were > > discussing - in particular that of U+FCCA ARABIC LIGATURE LAM WITH > > HAH INITIAL FORM. (The ligature was germane to the discussion.) > > Many of the ligatures are not essential for proper shaping. I've > > now found and lawfully installed a font that gives me the ligature > > from normal Arabic letters. > > Which font is that, please? Amiri. > And does anyone here know why the Courier New font on Windows XP does > produce the ligature from those two characters, but the same font on > Windows 10 doesn't? Is this ligature somehow deemed inappropriate or > problematic? I'm not asking about U+FCCA, I'm asking about the > display of the two characters U+0644 and U+062D -- should it ligate or > shouldn't it? Well, as Courier New is generally seen as a plain 'typewriter' font, such ligatures would seem out of place in a font of that name. One can find claims that the only compulsory ligature is lam-alif. Richard. From hubaishan at outlook.sa Sat Feb 19 07:30:49 2022 From: hubaishan at outlook.sa (Saeed Hubaishan) Date: Sat, 19 Feb 2022 13:30:49 +0000 Subject: =?windows-1256?Q?=D1=CF:_=D1=CF:_Wrong_sequence_for_Arabic_ligature_marks?= =?windows-1256?Q?(FC5E-FC62,_FCF2-FCF4)?= In-Reply-To: <20220219125237.503aca31@JRWUBU2> References: <20220218194828.26235c8d@JRWUBU2> <20220219125237.503aca31@JRWUBU2> Message-ID: See how some fonts in windows render FATHA + SHADDA in Pic ________________________________ ??: ??Unicode ???????? ?? Richard Wordingham via Unicode ???? ???????: 19 ??????, 2022 03:52 ? ???: unicode at corp.unicode.org ?????????: Re: ??: Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4) On Sat, 19 Feb 2022 10:20:31 +0000 Saeed Hubaishan via Unicode wrote: > But we have a problem with some program whom get thier data from > unicode like "MediaWiki" and "phpBB" they reorder ??? > to > ??? In codepoints, to . No process compliant with Unicode shall *deliberately* render them differently - the sequences are canonically equivalent. > with maybe rendered in some old windows fonts like > ??? > > you can try this with wikipedia This sequence is , which is not canonically normalised. Using the Naskh font Amiri, kasra is by default placed below lam. However, if I enable OpenType feature ss05, which for this font is described (unless the labels have been scrambled) as "Kasra is placed below Shadda instead of base glyph", the kasra is indeed placed immediately below the shadda. Unicode allows both renderings. I'm not sure that Unicode provides any plain text mechanism to distinguish the two renderings. In answer to Eli, the Amiri font is the one I downloaded to get LAM and HAH to automatically ligate; I got it from Ubuntu package fonts-hosny-amiri. The font is published under the SIL Open Font Licence. Richard. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Fatha shddah.png Type: image/png Size: 28535 bytes Desc: Fatha shddah.png URL: From richard.wordingham at ntlworld.com Sat Feb 19 11:01:41 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sat, 19 Feb 2022 17:01:41 +0000 Subject: =?UTF-8?B?2LHYrzog2LHYrzo=?= Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4) In-Reply-To: References: <20220218194828.26235c8d@JRWUBU2> <20220219125237.503aca31@JRWUBU2> Message-ID: <20220219170141.48adf7dc@JRWUBU2> On Sat, 19 Feb 2022 13:30:49 +0000 Saeed Hubaishan via Unicode wrote: > See how some fonts in windows render FATHA + SHADDA in Pic Well, the renderings are wrong. Whether the problem is in the application, the rendering engine or the font is less clear. Peter Constable recently opined that a font should work with all canonical equivalents, which is a bit harsh given that OpenType lookups were designed on the assumption that fonts would not have to reorder characters. Which application were you using, and what version of Windows? What fonts? Were they designed for Uniscribe/DirectWrite, or were they designed for HarfBuzz? As HarfBuzz expressly aims to render canonical equivalents the same, it is quite possible that the fonts used were designed expecting the rendering engine to do the AMRTA processing that Ken Whistler referred to earlier, and that they would work with the HarfBuzz renderer, which on Windows is used in MS Edge, Chrome, Firefox and LibreOffice. Richard. From aprilop at freenet.de Sat Feb 19 11:38:11 2022 From: aprilop at freenet.de (Andreas Prilop) Date: Sat, 19 Feb 2022 18:38:11 +0100 Subject: Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4) In-Reply-To: <20220219125237.503aca31@JRWUBU2> References: <20220218194828.26235c8d@JRWUBU2> <20220219125237.503aca31@JRWUBU2> Message-ID: <25E9B4E3-CFB0-4B05-B88D-1621CF507171@freenet.de> On 19 February 2022 13:52:37 CET, Richard Wordingham wrote: > This sequence is , which is not canonically > normalised. Using the Naskh font Amiri, kasra is by default placed below > lam. However, if I enable OpenType feature ss05, which for this font is > described (unless the labels have been scrambled) as "Kasra is placed > below Shadda instead of base glyph", the kasra is indeed placed > immediately below the shadda. Unicode allows both renderings. > I'm not sure that Unicode provides any plain text mechanism to > distinguish the two renderings. Write ZWNJ between shadda and kasra.

لّ‌ِ

From aprilop at freenet.de Sat Feb 19 12:18:02 2022 From: aprilop at freenet.de (Andreas Prilop) Date: Sat, 19 Feb 2022 19:18:02 +0100 Subject: Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4)I In-Reply-To: <25E9B4E3-CFB0-4B05-B88D-1621CF507171@freenet.de> References: <20220218194828.26235c8d@JRWUBU2> <20220219125237.503aca31@JRWUBU2> <25E9B4E3-CFB0-4B05-B88D-1621CF507171@freenet.de> Message-ID: On 19 February 2022 18:38:11 CET, I wrote: > Write ZWNJ between shadda and kasra. > >

لّ‌ِ

It is strange that ?‌? disappeared on https://corp.unicode.org/pipermail/unicode/2022-February/009965.html From richard.wordingham at ntlworld.com Sat Feb 19 14:09:36 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sat, 19 Feb 2022 20:09:36 +0000 Subject: Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4)I In-Reply-To: References: <20220218194828.26235c8d@JRWUBU2> <20220219125237.503aca31@JRWUBU2> <25E9B4E3-CFB0-4B05-B88D-1621CF507171@freenet.de> Message-ID: <20220219200936.104cc1ed@JRWUBU2> On Sat, 19 Feb 2022 19:18:02 +0100 Andreas Prilop via Unicode wrote: > On 19 February 2022 18:38:11 CET, I wrote: > > > Write ZWNJ between shadda and kasra. > > > >

لّ‌ِ

To achieve which rendering? For HarfBuzz with the Amiri font, feature ss05 still selects its form. On the other hand, for HarfBuzz with Firefox's default font, it selects kasra below lam, whereas without ZWNJ one gets kasra below shadda. Now, for HarfBuzz with the Amiri font, consistently gets kasra below lam, which is the font's default. I suspect each font goes its own way. > It is strange that ?‌? disappeared on > > https://corp.unicode.org/pipermail/unicode/2022-February/009965.html It's still there, but converted from character entity to entity. From wjgo_10009 at btinternet.com Mon Feb 21 06:43:04 2022 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Mon, 21 Feb 2022 12:43:04 +0000 (GMT) Subject: International Mother Language Day 2022 Message-ID: <629a354c.689b.17f1c4e8cf8.Webtop.96@btinternet.com> https://en.unesco.org/commemorations/motherlanguageday William Overington Monday 21 February 2022 From wjgo_10009 at btinternet.com Mon Feb 21 06:58:41 2022 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Mon, 21 Feb 2022 12:58:41 +0000 (GMT) Subject: Art produced using glyphs that were generated using the Alphabet Synthesis Machine Message-ID: <112ba10b.694e.17f1c5cd86f.Webtop.96@btinternet.com> Almost twenty years ago, in 2002, there was a post in this mailing list about the Alphabet Synthesis Machine. https://www.unicode.org/mail-arch/unicode-ml/y2002-m02/0541.html Here is a link to a 2022 thread with art using glyphs from fonts that were produced at that time. https://forum.affinity.serif.com/index.php?/topic/157614-lady-reading-haiku-to-an-elephant/ William Overington Monday 21 February 2022 From mark at kli.org Tue Feb 22 08:00:29 2022 From: mark at kli.org (Mark E. Shoulson) Date: Tue, 22 Feb 2022 09:00:29 -0500 Subject: Kirai Rat Decompositions, was Re: Compatibility decomposables that are not compatibility characters In-Reply-To: <20220218220640.3dd190dc@JRWUBU2> References: <703ad57b-3d7f-4b56-8221-a1c8876ad061@sonic.net> <34c1a873-25c2-8c66-6f59-278d95341eb1@shoulson.com> <20220218220640.3dd190dc@JRWUBU2> Message-ID: On 2/18/22 17:06, Richard Wordingham via Unicode wrote: > On Fri, 18 Feb 2022 14:46:13 -0500 > "Mark E. Shoulson via Unicode" wrote: > >> Perhaps relevant to this thread, I was just reading in >> https://www.unicode.org/L2/L2022/22043-kirat-rai.pdf L2/22-043, >> proposal to encode Kirai Rat Script, where it remarks regarding the >> vowels: >> >>> These should all be encoded atomically. This is because >>> linguistically these vowels are not composed of two >>> separatecharacters, they are single vowels in their own right. It >>> is true that the custom encoded Kirat Rai font uses decomposedvowel >>> signs as a matter of expediency, but this decision should not >>> influence the right way to encode the script.Because the glyph for >>> some of the vowels (aa and e) are part of the shape of the last 3 >>> vowels (ai, o, au) there shouldbe canonical decompositions for the >>> last 3 vowels. With these decompositions, Do Not Use tables are not >>> necessary. >> If the vowels are to be encoded atomically, and it sounds like they >> should be, shouldn't we *not* want to have canonical decompositions >> for them?? I thought Unicode was trying to avoid precomposed >> characters at this point.? I guess it's too late to hope for "only >> one right way to spell it" out of Unicode, but is that still >> something we try to approach?? It almost seems to me that canonical >> decompositions also stem from cases of "things that wouldn't be >> encoded if they were proposed now," and if so it would not really >> make sense to propose anything with a canonical decomposition.? Or am >> I misunderstanding the attitude towards canonical decompositions, or >> the proposal's statement? > X technology should obviously be opposed wherever possible. We should > make it impossible to enter these vowel symbols at a a single stroke > when using a simple X keyboard or even an MSKLC keyboard creator. We > must keep professional keyboard writers in work. > > Your wording is confusing. There are several different options: > > 1) Only allow encoding for single vowels (the Khmer model) > 2) Do not encode visually compound vowels (the Myanmar model) > 3) Allow visually compound vowels as sequences or as single characters > (the south Indian model) > > The proposal argues for (3), which rather assumes that canonical > equivalence will be taken seriously. At least we don't have the > problem presented by doubled multipart south Indian vowels. > > Model (1) calls forth a need for stop lists, and potential confusion > when a compound vowel notation is later found to be needed. (From > the Southern Thai point of view, there seems to be a vowel missing from > the Khmer script which it would be very tempting to just encode as > , though in *Khmer* usage it is arguably just a glyph > variant of U+17BE KHMER VOWEL SIGN OE.) > > I think you're calling for (2), which with current technology seems to > make keyboard creation unduly complicated or fragile if we want users > to be able to treat KIRAT RAI VOWEL SIGN O as a single entity. (Do > users have such a perception? We'll probably be told that it's not a > user-perceived character.) Sorry to have been confusing, and I'm not so much "calling for" one answer or another as asking what's more in line with what we do.? The text in the proposal says "These should all be encoded atomically. This is because linguistically these vowels are not composed of two separate characters, they are single vowels in their own right."? This would seem to me to be proposing that the seemingly-compound characters be encoded instead as single characters, because they are not viewed as being compound.? And that makes sense to me, as well, albeit we also go in the other direction, in not encoding compound letters like "ll" or "ch" in Welsh as separate letters. But then the proposal goes on to say "Because the glyph for some of the vowels (aa and e) are part of the shape of the last 3 vowels (ai, o, au) there should be canonical decompositions for the last 3 vowels," which sounds to me like the atomic single "ai" vowel is to be given a canonical decomposition into its simpler components, i.e., "ai" is basically a precomposed character, like ?, which has atomic existence but is canonically equivalent to e + ??.? As I understand it, that would be #3 in your list above.? And I thought that was considered a Bad Thing these days, that we were trying to avoid, when possible, having too many ways to represent the "same" (canonically equivalent) text.? Am I wrong about that, in general? I guess if I were to be "calling for" anything, it would be... um, now I'm finding your wording unclear.? I think #1 in your list, by which I intend that aa and e and ai and o and au and everything would each be given its own code-point, and that none of those code-points would be canonically equivalent to a sequence of the others.? #2 sounds like encoding only the vowel-signs which don't look like sequences of others, and ai and o and au could only be represented as sequences, which seems to run counter to the proposal (not that decisions can't be made counter to proposals), and #3 sounds like encoding each vowel as its own character, as in #1, *and* the "compound" variables could be represented either by their own codepoints or by sequences of "simple" vowels, and the two representations would be canonically equivalent, and that situation, to me, seems undesirable. Am I making sense? ~mark From richard.wordingham at ntlworld.com Tue Feb 22 16:05:04 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Tue, 22 Feb 2022 22:05:04 +0000 Subject: Kirai Rat Decompositions, was Re: Compatibility decomposables that are not compatibility characters In-Reply-To: References: <703ad57b-3d7f-4b56-8221-a1c8876ad061@sonic.net> <34c1a873-25c2-8c66-6f59-278d95341eb1@shoulson.com> <20220218220640.3dd190dc@JRWUBU2> Message-ID: <20220222220504.704d46d6@JRWUBU2> On Tue, 22 Feb 2022 09:00:29 -0500 "Mark E. Shoulson via Unicode" wrote: > But then the proposal goes on to say "Because the glyph for some of > the vowels (aa and e) are part of the shape of the last 3 vowels (ai, > o, au) there should be canonical decompositions for the last 3 > vowels," which sounds to me like the atomic single "ai" vowel is to > be given a canonical decomposition into its simpler components, i.e., > "ai" is basically a precomposed character, like ?, which has atomic > existence but is canonically equivalent to e + ??.? As I understand > it, that would be #3 in your list above.? And I thought that was > considered a Bad Thing these days, that we were trying to avoid, when > possible, having too many ways to represent the "same" (canonically > equivalent) text.? Am I wrong about that, in general? What we want to avoid is canonically *inequivalent* ways of encoding the same thing. We are still encoding decomposable characters for Indic vowels. #3 doesn't introduce any new problems, and certainly none that don't affect most Western European languages. #3 is what is actually proposed, though it's not obvious from the descriptive text. The visually compound vowels are given canonical equivalents in the code chart. The only problem is that canonical equivalence continues to be badly supported. > I guess if I were to be "calling for" anything, it would be... um, > now I'm finding your wording unclear.? I think #1 in your list, by > which I intend that aa and e and ai and o and au and everything would > each be given its own code-point, and that none of those code-points > would be canonically equivalent to a sequence of the others. The problem with that people would still try to type the obvious decompositions, and they would work for at least a while. Indeed, for this script, the (dependent) vowels could be categorised as Lo. >?#2 > sounds like encoding only the vowel-signs which don't look like > sequences of others, and ai and o and au could only be represented as > sequences, which seems to run counter to the proposal (not that > decisions can't be made counter to proposals), and #3 sounds like > encoding each vowel as its own character, as in #1, *and* the > "compound" variables could be represented either by their own > codepoints or by sequences of "simple" vowels, and the two > representations would be canonically equivalent, and that situation, > to me, seems undesirable. > Am I making sense? Yes. Richard. From mark at kli.org Tue Feb 22 19:49:50 2022 From: mark at kli.org (Mark E. Shoulson) Date: Tue, 22 Feb 2022 20:49:50 -0500 Subject: Kirai Rat Decompositions, was Re: Compatibility decomposables that are not compatibility characters In-Reply-To: <20220222220504.704d46d6@JRWUBU2> References: <703ad57b-3d7f-4b56-8221-a1c8876ad061@sonic.net> <34c1a873-25c2-8c66-6f59-278d95341eb1@shoulson.com> <20220218220640.3dd190dc@JRWUBU2> <20220222220504.704d46d6@JRWUBU2> Message-ID: <46f6c9ec-d3d3-ceae-80b7-2b18ad97b725@shoulson.com> On 2/22/22 17:05, Richard Wordingham via Unicode wrote: > On Tue, 22 Feb 2022 09:00:29 -0500 > "Mark E. Shoulson via Unicode" wrote: > >> But then the proposal goes on to say "Because the glyph for some of >> the vowels (aa and e) are part of the shape of the last 3 vowels (ai, >> o, au) there should be canonical decompositions for the last 3 >> vowels," which sounds to me like the atomic single "ai" vowel is to >> be given a canonical decomposition into its simpler components, i.e., >> "ai" is basically a precomposed character, like ?, which has atomic >> existence but is canonically equivalent to e + ??.? As I understand >> it, that would be #3 in your list above.? And I thought that was >> considered a Bad Thing these days, that we were trying to avoid, when >> possible, having too many ways to represent the "same" (canonically >> equivalent) text.? Am I wrong about that, in general? > What we want to avoid is canonically *inequivalent* ways of encoding the > same thing. We are still encoding decomposable characters for Indic > vowels. > > #3 doesn't introduce any new problems, and certainly none that don't > affect most Western European languages. #3 is what is actually > proposed, though it's not obvious from the descriptive text. The > visually compound vowels are given canonical equivalents in the code > chart. The only problem is that canonical equivalence continues to be > badly supported. OK.? I had been thinking that multiple canonically equivalent ways to encode it would just mean more hassles for NFC/NFD processing, and that it would be better to have just the atomic ones.? But as you point out: > The problem with that people would still try to type the > obvious decompositions, and they would work for at least a while. People _might_ view the characters as atomic, but then they _might_ not, and you aren't going to stop them by saying not to. OK.? I see now why encoding the atomic characters _and_ canonical equivalents makes sense.? Thank you. >> Am I making sense? > Yes. Thanks.? I need to be reassured of that from time to time! > Richard. ~mark From sai at fiatfiendum.org Sat Feb 26 07:32:27 2022 From: sai at fiatfiendum.org (Sai) Date: Sat, 26 Feb 2022 13:32:27 +0000 Subject: =?UTF-8?Q?E=2Dinside=2Do_=2F_o=2Denclosing=2De_variant_of_German_=C3=B6?= Message-ID: Hello all. Does Unicode have an existing way to encode the e-inside-o / o-enclosing-e* variant o-e ligature for German ?? See e.g.: * the ? in V?geln on the cover of 1st edition of Konrad Lorenz's _Er redete mit dem Vieh, den V?geln und den Fischen_ https://en.wikipedia.org/wiki/File:ErRedeteMitDemViehDenV%C3%B6gelnUndDenFischen.jpg - n.b. other editions have normal ?; I do not know if it's used inside the book in normal or heading texts, or just on the cover * the ? in K?ln (English: Cologne) in the inscription of its cathedral's crypt https://commons.wikimedia.org/wiki/File:O_containing_E_ligature.jpg I do not know whether it is used in any language other than German, nor how widely used it is for German. There's a CC by-sa SVG of the capital version here: https://commons.wikimedia.org/wiki/File:Latin_capital_letter_O_containing_E.svg ? but I don't know of a lower-case version. There exist Unicode: * ? U+24BA and ? U+24D4 ? circled latin capital/small letter e, in the Enclosed Alphanumerics block * ? U+0152 and ? U+0153 ? Latin capital/small ligature oe, in the Latin Extended-A block * ? U+0276 ? Latin letter small capital oe, in the IPA Extensions block However, ?/? use a circle (not letter o), and don't decompose to ? or ?; and I have not found something that does decompose to ? which would use the enclosed ligature. I don't know combining characters well enough to tell if there is a combining version of either o or e which would allow this. So? is this already a thing? Has it been proposed before? Ought it be added to Unicode? Sincerely, Sai President, Fiat Fiendum, Inc., a 501(c)(3) * phrasing it both ways just so this discussion is easier to find by search From dpk at nonceword.org Sat Feb 26 10:21:47 2022 From: dpk at nonceword.org (Daphne Preston-Kendal) Date: Sat, 26 Feb 2022 17:21:47 +0100 Subject: E-inside-o / o-enclosing-e variant of German =?utf-8?q?=C3=B6?= In-Reply-To: References: Message-ID: On 26 Feb 2022, at 14:32, Sai via Unicode wrote: > Hello all. > > Does Unicode have an existing way to encode the e-inside-o / > o-enclosing-e* variant o-e ligature for German ?? It could reasonably be considered a typographical variant of ? or of the combination o / O + U+0364 COMBINING LATIN SMALL LETTER E. -- dpk (Daphne Preston-Kendal) ?? 12107 Berlin, Germany ?? http://dpk.io/ ?What?s the good of Mercator?s North Poles and Equators, Tropics, Zones, and Meridian Lines?? So the Bellman would cry: and the crew would reply ?They are merely conventional signs!? ? Carroll, Hunting of the Snark From sosipiuk at gmail.com Sat Feb 26 12:00:42 2022 From: sosipiuk at gmail.com (=?UTF-8?Q?S=C5=82awomir_Osipiuk?=) Date: Sat, 26 Feb 2022 13:00:42 -0500 Subject: =?UTF-8?Q?Re=3A_E=2Dinside=2Do_=2F_o=2Denclosing=2De_variant_of_German_=C3=B6?= In-Reply-To: References: Message-ID: This character doesn't currently exist, nor is there any apparent way to compose it, except in an ugly form using an enclosing circle. There is a combining letter e, but it gets placed above the previous character: o? "Unicode has done something similar before" seems to be a less-than ironclad argument; precedent is not a strong factor from what I've seen. That said, I cannot imagine how U+A66E MULTIOCULAR O (which had only one example) can be justified for inclusion while this e-inside-o isn't. The proposal which brought us ?: http://unicode.org/wg2/docs/n3194.pdf S?awomir Osipiuk On Sat, Feb 26, 2022 at 11:05 AM Sai via Unicode wrote: > > Hello all. > > Does Unicode have an existing way to encode the e-inside-o / > o-enclosing-e* variant o-e ligature for German ?? > > See e.g.: > * the ? in V?geln on the cover of 1st edition of Konrad Lorenz's _Er > redete mit dem Vieh, den V?geln und den Fischen_ > https://en.wikipedia.org/wiki/File:ErRedeteMitDemViehDenV%C3%B6gelnUndDenFischen.jpg > - n.b. other editions have normal ?; I do not know if it's used inside > the book in normal or heading texts, or just on the cover > * the ? in K?ln (English: Cologne) in the inscription of its > cathedral's crypt > https://commons.wikimedia.org/wiki/File:O_containing_E_ligature.jpg > > I do not know whether it is used in any language other than German, > nor how widely used it is for German. > > There's a CC by-sa SVG of the capital version here: > https://commons.wikimedia.org/wiki/File:Latin_capital_letter_O_containing_E.svg > ? but I don't know of a lower-case version. > > There exist Unicode: > * ? U+24BA and ? U+24D4 ? circled latin capital/small letter e, in the > Enclosed Alphanumerics block > * ? U+0152 and ? U+0153 ? Latin capital/small ligature oe, in the > Latin Extended-A block > * ? U+0276 ? Latin letter small capital oe, in the IPA Extensions block > > However, ?/? use a circle (not letter o), and don't decompose to ? or > ?; and I have not found something that does decompose to ? which would > use the enclosed ligature. > > I don't know combining characters well enough to tell if there is a > combining version of either o or e which would allow this. > > So? is this already a thing? Has it been proposed before? Ought it be > added to Unicode? > > Sincerely, > Sai > President, Fiat Fiendum, Inc., a 501(c)(3) > > * phrasing it both ways just so this discussion is easier to find by search > From wjgo_10009 at btinternet.com Sat Feb 26 10:45:08 2022 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Sat, 26 Feb 2022 16:45:08 +0000 (GMT) Subject: =?UTF-8?Q?Re:_E-inside-o_/_o-enclosing-e_variant_of_German_=C3=B6?= In-Reply-To: References: Message-ID: <1ab0925c.411a.17f36ebf677.Webtop.102@btinternet.com> Sai wrote: > Does Unicode have an existing way to encode the e-inside-o / > o-enclosing-e* variant o-e ligature for German ?? I do not know if it exists at present, but I think that it possibly could be formally encoded using ? followed by a Variation Selector character. If this becomes formally encoded, perhaps at the same time the version where the e is above the o could be encoded too? https://en.wikipedia.org/wiki/%C3%96#Typography William Overington Saturday 26 February 2022 -------------- next part -------------- An HTML attachment was scrubbed... URL: From duerst at it.aoyama.ac.jp Sat Feb 26 19:45:33 2022 From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J=2e_D=c3=bcrst?=) Date: Sun, 27 Feb 2022 10:45:33 +0900 Subject: =?UTF-8?Q?Re=3a_E-inside-o_/_o-enclosing-e_variant_of_German_=c3=b6?= In-Reply-To: References: Message-ID: I'd personally say this is just a font variant of ?. It's the book designer's/inscribers choice. It may look way different for outsiders, but people used to German will immediately understand what it is. Regards, Martin. On 2022-02-26 22:32, Sai via Unicode wrote: > Hello all. > > Does Unicode have an existing way to encode the e-inside-o / > o-enclosing-e* variant o-e ligature for German ?? > > See e.g.: > * the ? in V?geln on the cover of 1st edition of Konrad Lorenz's _Er > redete mit dem Vieh, den V?geln und den Fischen_ > https://en.wikipedia.org/wiki/File:ErRedeteMitDemViehDenV%C3%B6gelnUndDenFischen.jpg > - n.b. other editions have normal ?; I do not know if it's used inside > the book in normal or heading texts, or just on the cover > * the ? in K?ln (English: Cologne) in the inscription of its > cathedral's crypt > https://commons.wikimedia.org/wiki/File:O_containing_E_ligature.jpg > > I do not know whether it is used in any language other than German, > nor how widely used it is for German. > > There's a CC by-sa SVG of the capital version here: > https://commons.wikimedia.org/wiki/File:Latin_capital_letter_O_containing_E.svg > ? but I don't know of a lower-case version. > > There exist Unicode: > * ? U+24BA and ? U+24D4 ? circled latin capital/small letter e, in the > Enclosed Alphanumerics block > * ? U+0152 and ? U+0153 ? Latin capital/small ligature oe, in the > Latin Extended-A block > * ? U+0276 ? Latin letter small capital oe, in the IPA Extensions block > > However, ?/? use a circle (not letter o), and don't decompose to ? or > ?; and I have not found something that does decompose to ? which would > use the enclosed ligature. > > I don't know combining characters well enough to tell if there is a > combining version of either o or e which would allow this. > > So? is this already a thing? Has it been proposed before? Ought it be > added to Unicode? > > Sincerely, > Sai > President, Fiat Fiendum, Inc., a 501(c)(3) > > * phrasing it both ways just so this discussion is easier to find by search > From wjgo_10009 at btinternet.com Sat Feb 26 16:22:46 2022 From: wjgo_10009 at btinternet.com (William_J_G Overington) Date: Sat, 26 Feb 2022 22:22:46 +0000 (GMT) Subject: =?UTF-8?Q?Re:_E-inside-o_/_o-enclosing-e_variant_of_German_=C3=B6?= In-Reply-To: <1ab0925c.411a.17f36ebf677.Webtop.102@btinternet.com> References: <1ab0925c.411a.17f36ebf677.Webtop.102@btinternet.com> Message-ID: <2fffbb82.452a.17f38211796.Webtop.102@btinternet.com> I write to make a correction please. Earlier I wrote as follows: > If this becomes formally encoded, perhaps at the same time the version > where the e is above the o could be encoded too? However, since then I have read the following. S?awomir Osipiuk wrote: > There is a combining letter e, but it gets placed above the previous> > character: o? So the character o? is already encoded. I have found the combining letter e at U+0364. U+0364 COMBINING LATIN SMALL LETTER E https://www.unicode.org/charts/PDF/U0300.pdf William Overington Saturday 26 February 2022 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jukkakk at gmail.com Sun Feb 27 06:00:51 2022 From: jukkakk at gmail.com (Jukka K. Korpela) Date: Sun, 27 Feb 2022 14:00:51 +0200 Subject: =?UTF-8?Q?Re=3A_E=2Dinside=2Do_=2F_o=2Denclosing=2De_variant_of_German_=C3=B6?= In-Reply-To: References: Message-ID: Martin J. D?rst via Unicode (unicode at corp.unicode.org) wrote: I'd personally say this is just a font variant of ?. It's the book > designer's/inscribers choice. It may look way different for outsiders, > but people used to German will immediately understand what it is. With my limited (two years at school) understanding of German, I fully agree. The letter ? originates from an o with an e above it, and in German it is customary to replace ? by oe (a two-character combination, not the ligature ?) when needed, e.g. when the character repertoire is limited to that of Ascii. Since KOELN would be understood as K?LN, so would KOLN with an E inside the O ? a surprise perhaps if you never saw it before, but not a new character. Things might be different if there were texts where both a normal ? and an o with an e inside both appear within the same font. Even then, I would say it is a font variant of ?. A font may well contain variant glyphs for a character. In order to justify encoding an o with an e inside, I think you would need present evidence of texts showing 1) usage where it causes a difference in meaning with respect to ?, or 2) usage that is independent of the use of the letter ? in different human languages, such as use in some special phonetic or technical meaning. Yucca, https://jkorpela.fi -------------- next part -------------- An HTML attachment was scrubbed... URL: From harjitmoe at outlook.com Sun Feb 27 06:44:21 2022 From: harjitmoe at outlook.com (Harriet Riddle) Date: Sun, 27 Feb 2022 12:44:21 +0000 Subject: =?UTF-8?Q?Re:_E-inside-o_/_o-enclosing-e_variant_of_German_=c3=b6?= In-Reply-To: References: Message-ID: > A font may well contain variant glyphs for a character. In order to > justify encoding an o with an e inside, I think you would need present > evidence of texts showing 1) usage where it causes a difference in > meaning with respect to ?, or 2) usage that is independent of the use > of the letter ? in different human languages, such as use in some > special phonetic or technical meaning. One thing I don't think I've seen mentioned yet is that ? is already a unification of O-diaeresis and O-umlaut, and while the glyph variant under discussion is a valid variant of O-umlaut (related to o?, ? and ? as other variants, where form acceptability and form preference varies between languages that use O-umlaut), it is not a valid variant of O-diaeresis. ?Har. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.lange at catrinity-font.de Sun Feb 27 03:13:24 2022 From: alexander.lange at catrinity-font.de (Alexander Lange) Date: Sun, 27 Feb 2022 10:13:24 +0100 Subject: =?UTF-8?Q?Re=3a_E-inside-o_/_o-enclosing-e_variant_of_German_=c3=b6?= In-Reply-To: References: Message-ID: <0e3dd872-d388-534f-f87a-8a1a8da1f186@catrinity-font.de> Hi, another German here. I also think it is just a glyph variant of ? - or rather ?. I have only ever seen this in all-uppercase inscriptions where the line height is hardly bigger than the capital height. In both images Sai has linked to you can see that all lines would need to be higher just for the one umlaut in one of the lines if the standard glyph were used. There are several strategies to achieve this: ?* Use a smaller variant of the base letter. This is commonly done on ?? keyboards, see e.g. here: https://angelikasgerman.co.uk/what-does-a-german-keyboard-look-like/ ?? and on license plates: https://en.wikipedia.org/wiki/FE-Schrift#/media/File:FE-Schrift.svg ?* Put dots or e inside O or U (doesn't work well with A) ?* Put one dot at each side of A or O (doesn't work well with U) ?* Use AE, OE or UE. In normal text and especially on small letters, none of this is needed as you have enough space on top of the letters anyway. Kind regards, Alexander On 27.02.2022 02:45, Martin J. D?rst via Unicode wrote: > I'd personally say this is just a font variant of ?. It's the book > designer's/inscribers choice. It may look way different for outsiders, > but people used to German will immediately understand what it is. > > Regards,?? Martin. > > > On 2022-02-26 22:32, Sai via Unicode wrote: >> Hello all. >> >> Does Unicode have an existing way to encode the e-inside-o / >> o-enclosing-e* variant o-e ligature for German ?? >> >> See e.g.: >> * the ? in V?geln on the cover of 1st edition of Konrad Lorenz's _Er >> redete mit dem Vieh, den V?geln und den Fischen_ >> https://en.wikipedia.org/wiki/File:ErRedeteMitDemViehDenV%C3%B6gelnUndDenFischen.jpg >> >> - n.b. other editions have normal ?; I do not know if it's used inside >> the book in normal or heading texts, or just on the cover >> * the ? in K?ln (English: Cologne) in the inscription of its >> cathedral's crypt >> https://commons.wikimedia.org/wiki/File:O_containing_E_ligature.jpg >> >> I do not know whether it is used in any language other than German, >> nor how widely used it is for German. >> >> There's a CC by-sa SVG of the capital version here: >> https://commons.wikimedia.org/wiki/File:Latin_capital_letter_O_containing_E.svg >> >> ? but I don't know of a lower-case version. >> >> There exist Unicode: >> * ? U+24BA and ? U+24D4 ? circled latin capital/small letter e, in the >> Enclosed Alphanumerics block >> * ? U+0152 and ?? U+0153 ? Latin capital/small ligature oe, in the >> Latin Extended-A block >> * ? U+0276 ? Latin letter small capital oe, in the IPA Extensions block >> >> However, ?/? use a circle (not letter o), and don't decompose to ? or >> ?; and I have not found something that does decompose to ? which would >> use the enclosed ligature. >> >> I don't know combining characters well enough to tell if there is a >> combining version of either o or e which would allow this. >> >> So? is this already a thing? Has it been proposed before? Ought it be >> added to Unicode? >> >> Sincerely, >> Sai >> President, Fiat Fiendum, Inc., a 501(c)(3) >> >> * phrasing it both ways just so this discussion is easier to find by >> search >> > From kent.b.karlsson at bahnhof.se Sun Feb 27 12:34:39 2022 From: kent.b.karlsson at bahnhof.se (Kent Karlsson) Date: Sun, 27 Feb 2022 19:34:39 +0100 Subject: =?utf-8?Q?Re=3A_E-inside-o_/_o-enclosing-e_variant_of_German_?= =?utf-8?Q?=C3=B6?= In-Reply-To: References: Message-ID: > 26 feb. 2022 kl. 17:21 skrev Daphne Preston-Kendal via Unicode : > > On 26 Feb 2022, at 14:32, Sai via Unicode wrote: > >> Hello all. >> >> Does Unicode have an existing way to encode the e-inside-o / >> o-enclosing-e* variant o-e ligature for German ?? > > > It could reasonably be considered a typographical variant of ? or of the > combination o / O + U+0364 COMBINING LATIN SMALL LETTER E. It is most definitely NOT a glyph variant of ?. With quite a bit of stretch it may be considered a glyph variant of o? (small or capital). After all, having the double dots inside of ? (and similar) is considered a glyph variant of ? (see Alexander Lang?s message in this thread) and even capitals or small capitals are sometimes considered variants of small letters, Opentype fonts can even have ?feature tags? for that, and similarly for CSS: font-variant-caps: small-caps;and text-transform: uppercase;. (The latter is called a ?transform?, but CSS is about styling, so it is actually a styling not a transform; the stored text is not changed.) /Kent K -------------- next part -------------- An HTML attachment was scrubbed... URL: From asmusf at ix.netcom.com Sun Feb 27 22:13:50 2022 From: asmusf at ix.netcom.com (Asmus Freytag) Date: Sun, 27 Feb 2022 20:13:50 -0800 Subject: =?UTF-8?Q?Re=3a_E-inside-o_/_o-enclosing-e_variant_of_German_=c3=b6?= In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Mon Feb 28 15:09:42 2022 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Mon, 28 Feb 2022 21:09:42 +0000 Subject: Bidi and Empty Parentheses Message-ID: <20220228210942.0271a9e2@JRWUBU2> At a right-to-left embedding level, in the absence of directional overrides, should the 4-character ASCII substring "x()y" render like "x()y" or like "y()x"? Richard. From kenwhistler at sonic.net Mon Feb 28 19:41:03 2022 From: kenwhistler at sonic.net (Ken Whistler) Date: Mon, 28 Feb 2022 17:41:03 -0800 Subject: Bidi and Empty Parentheses In-Reply-To: <20220228210942.0271a9e2@JRWUBU2> References: <20220228210942.0271a9e2@JRWUBU2> Message-ID: <3655c0eb-b355-4762-6a9f-11687562b525@sonic.net> Richard, "x()y" More specifically, with an explicit LTR paragraph direction: Trace: Entering br_Check Current State: 20 ? Text:??????? 0078 0028 0029 0079 ? Bidi_Class:???? L??? L??? L??? L ? Levels: *0??? 0??? 0??? 0* ? Exp Levels:???? 0??? 0??? 0??? 0 ? Runs:??????? ? Order:????? [0 1 2 3] ? Exp Order:? [0 1 2 3] I.e. "x()y" With an explicit RTL paragraph direction: Trace: Entering br_Check Current State: 20 ? Text:??????? 0078 0028 0029 0079 ? Bidi_Class:???? L??? L??? L??? L ? Levels: *2??? 2??? 2??? 2* ? Exp Levels:???? 2??? 2??? 2??? 2 ? Runs:??????? ? Order:????? [0 1 2 3] ? Exp Order:? [0 1 2 3] I.e. "x()y". Note that the paragraph embedding level is 1, and the resolved levels are 2 (instead of 0), but the resolved display order of the string is identical in both cases. --Ken On 2/28/2022 1:09 PM, Richard Wordingham via Unicode wrote: > At a right-to-left embedding level, in the absence of directional > overrides, should the 4-character ASCII substring "x()y" render like > "x()y" or like "y()x"? > > Richard. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eliz at gnu.org Mon Feb 28 21:31:48 2022 From: eliz at gnu.org (Eli Zaretskii) Date: Tue, 01 Mar 2022 05:31:48 +0200 Subject: Bidi and Empty Parentheses In-Reply-To: <20220228210942.0271a9e2@JRWUBU2> (message from Richard Wordingham via Unicode on Mon, 28 Feb 2022 21:09:42 +0000) References: <20220228210942.0271a9e2@JRWUBU2> Message-ID: <83tucil55n.fsf@gnu.org> > Date: Mon, 28 Feb 2022 21:09:42 +0000 > From: Richard Wordingham via Unicode > > At a right-to-left embedding level, in the absence of directional > overrides, should the 4-character ASCII substring "x()y" render like > "x()y" or like "y()x"? y()x, AFAIU. From eliz at gnu.org Mon Feb 28 21:38:20 2022 From: eliz at gnu.org (Eli Zaretskii) Date: Tue, 01 Mar 2022 05:38:20 +0200 Subject: Bidi and Empty Parentheses In-Reply-To: <3655c0eb-b355-4762-6a9f-11687562b525@sonic.net> (message from Ken Whistler via Unicode on Mon, 28 Feb 2022 17:41:03 -0800) References: <20220228210942.0271a9e2@JRWUBU2> <3655c0eb-b355-4762-6a9f-11687562b525@sonic.net> Message-ID: <83pmn6l4ur.fsf@gnu.org> > Date: Mon, 28 Feb 2022 17:41:03 -0800 > Cc: unicode at corp.unicode.org > From: Ken Whistler via Unicode > > Richard, > > "x()y" Maybe there's a misunderstanding. Richard said "in a right-to-left embedding", so I tried RLE x ( ) y PDF and got "y()x" on display.