From monicamerchant1 at gmail.com  Sat Feb  5 00:28:27 2022
From: monicamerchant1 at gmail.com (Monica Merchant)
Date: Sat, 5 Feb 2022 19:28:27 +1300
Subject: Normalizer tool by Richard Ishida
Message-ID: <CAGVe82RDZBE7Q5UQCu00+RXvS3qmbg7PNbFhi63qL+S+38_qpA@mail.gmail.com>

Hello,

Where might I find Richard Ishida's normalizer tool and source code? The
links in [this post](https://r12a.github.io/blog/200901.html) no longer
work.


Thank you,

mmerc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220205/2d567327/attachment.htm>

From abrahamgross at disroot.org  Sat Feb  5 13:51:43 2022
From: abrahamgross at disroot.org (ag disroot)
Date: Sat, 5 Feb 2022 19:51:43 +0000 (UTC)
Subject: Normalizer tool by Richard Ishida
In-Reply-To: <CAGVe82RDZBE7Q5UQCu00+RXvS3qmbg7PNbFhi63qL+S+38_qpA@mail.gmail.com>
References: <CAGVe82RDZBE7Q5UQCu00+RXvS3qmbg7PNbFhi63qL+S+38_qpA@mail.gmail.com>
Message-ID: <6fbd6bdc-0b00-4e61-97a1-761e045ee980@disroot.org>

https://r12a.github.io/uniview/

https://github.com/r12a/uniview
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220205/dcb45875/attachment.htm>

From ishida at w3.org  Mon Feb  7 07:20:11 2022
From: ishida at w3.org (r12a)
Date: Mon, 7 Feb 2022 13:20:11 +0000
Subject: Normalizer tool by Richard Ishida
In-Reply-To: <CAGVe82RDZBE7Q5UQCu00+RXvS3qmbg7PNbFhi63qL+S+38_qpA@mail.gmail.com>
References: <CAGVe82RDZBE7Q5UQCu00+RXvS3qmbg7PNbFhi63qL+S+38_qpA@mail.gmail.com>
Message-ID: <985bce2d-1f3f-3be7-440d-be59521efb36@w3.org>

I no longer maintain the JavaScript normalisation tool i wrote, since 
JavaScript now provides the normalize() function, and i use that. See 
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize

hth
ri


Fwiw, i also went through all the blog posts and changed rishida.net 
links to point to r12a.github.io.? I no longer own or have anything to 
do with the rishida.net domain name, despite the fact that someone has 
posted internationalisation-related content to it.


Monica Merchant via Unicode wrote on 05/02/2022 06:28:
> Hello,
>
> Where might I find Richard Ishida's normalizer tool and source code? 
> The links in [this post](https://r12a.github.io/blog/200901.html) no 
> longer work.
>
>
> Thank you,
>
> mmerc

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220207/2568b9bb/attachment.htm>

From wjgo_10009 at btinternet.com  Thu Feb 10 06:45:47 2022
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Thu, 10 Feb 2022 12:45:47 +0000 (GMT)
Subject: A multilingual sign that includes a language-independent glyph and
 a QR code
Message-ID: <3444560a.4fd2.17ee3ab16ea.Webtop.96@btinternet.com>

A multilingual sign that includes a language-independent glyph and a QR 
code

https://forum.affinity.serif.com/index.php?/topic/157030-thank-you-for-visiting/

William Overington

Thursday 10 February 2022


From abrahamgross at disroot.org  Thu Feb 10 09:29:20 2022
From: abrahamgross at disroot.org (ag disroot)
Date: Thu, 10 Feb 2022 15:29:20 +0000 (UTC)
Subject: A multilingual sign that includes a language-independent glyph and
 a QR code
In-Reply-To: <3444560a.4fd2.17ee3ab16ea.Webtop.96@btinternet.com>
References: <3444560a.4fd2.17ee3ab16ea.Webtop.96@btinternet.com>
Message-ID: <84c33a74-6145-46b7-88d5-c84a64eb0f0a@disroot.org>

You keep posting your "language-independent glyphs" here, but how is it language independant if no one understands what it means?

In that case logographies like Chinese hanzi and Egyptian heiroglyphs are just as language independent (at least the pictographs (??), ideographs (??) and compound ideographs (??)) because its symbols of real things so no language necessary. Hanzi is at least legible by ~1.5 billion people, and already has most ideas encoded in characters with a very easy way to extend it.

(if it sounds like I'm upset then I'm sorry, that wasn't the intention. just curious what your reasoning is)


From wjgo_10009 at btinternet.com  Thu Feb 10 12:07:03 2022
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Thu, 10 Feb 2022 18:07:03 +0000 (GMT)
Subject: A multilingual sign that includes a language-independent glyph
 and a QR code
In-Reply-To: <84c33a74-6145-46b7-88d5-c84a64eb0f0a@disroot.org>
References: <3444560a.4fd2.17ee3ab16ea.Webtop.96@btinternet.com>
 <84c33a74-6145-46b7-88d5-c84a64eb0f0a@disroot.org>
Message-ID: <3ae4d3a3.5c46.17ee4d137e8.Webtop.96@btinternet.com>


Hi

> You keep posting your "language-independent glyphs" here, but how is 
> it language independant if no one understands what it means?

It is language-independent even if nobody other than me knows the 
meaning that I have assigned to it. As a result of this thread, maybe a 
few more people will know what it means if they see the glyph again some 
time. Maybe as a work of art it will result in some people carrying out 
thought experiments. So the artwork could be a catalyst for progress in 
some way. Though maybe not. But epsilon of a chance is better than zero 
of a chance.

William Overington

Thursday 10 February 2022

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220210/95d55d61/attachment.htm>

From lyratelle at gmx.de  Thu Feb 10 16:58:57 2022
From: lyratelle at gmx.de (Dominikus Dittes Scherkl)
Date: Thu, 10 Feb 2022 23:58:57 +0100
Subject: A multilingual sign that includes a language-independent glyph
 and a QR code
In-Reply-To: <3ae4d3a3.5c46.17ee4d137e8.Webtop.96@btinternet.com>
References: <3444560a.4fd2.17ee3ab16ea.Webtop.96@btinternet.com>
 <84c33a74-6145-46b7-88d5-c84a64eb0f0a@disroot.org>
 <3ae4d3a3.5c46.17ee4d137e8.Webtop.96@btinternet.com>
Message-ID: <5b974265-9b2e-3d58-2174-29247ea5ce95@gmx.de>

Am 10.02.22 um 19:07 schrieb William_J_G Overington via Unicode:
> Hi
>
>
>  > You keep posting your "language-independent glyphs" here, but how is
> it language independant if no one understands what it means?
>
>
> It is language-independent even if nobody other than me knows the
> meaning that I have assigned to it.

No, that's not language independance. Its just a new language (with the
additional disadvantage that nobody knows it)


--
                                          Dominikus Dittes Scherkl


From johannes at bergerhausen.com  Fri Feb 11 03:41:50 2022
From: johannes at bergerhausen.com (Johannes Bergerhausen)
Date: Fri, 11 Feb 2022 10:41:50 +0100
Subject: update WWS website
Message-ID: <10C2419B-570A-4F1D-B752-5F3C5549FBD3@bergerhausen.com>

Dear list,

fyi: we have updated the worldswritingsystems.org <http://worldswritingsystems.org/> website to Unicode 14.0. Besides some corrections, there are also some new typographic reference glpyhs and a new FAQ page. If you spot a mistake, please send us a correction. 

By our count, there are currently 294 known scripts, living or historical. 131 of them are not yet encoded in Unicode.

Many greetings,
Johannes (Hochschule Mainz, Germany), Deborah (SEI Berkeley, USA), Thomas (ANRT Nancy, France)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220211/8e8a5231/attachment.htm>

From jk at koremail.com  Fri Feb 11 04:30:23 2022
From: jk at koremail.com (jk at koremail.com)
Date: Fri, 11 Feb 2022 18:30:23 +0800
Subject: update WWS website
In-Reply-To: <10C2419B-570A-4F1D-B752-5F3C5549FBD3@bergerhausen.com>
References: <10C2419B-570A-4F1D-B752-5F3C5549FBD3@bergerhausen.com>
Message-ID: <9ce3e3849e22c04ea1dc6c45b8eda455@koremail.com>


The list seems to be rather inaccurate in places.

It says for example that the Zhuang Square script has not been encoded. 
However whilst there are still characters to be added thousands of 
Zhuang square characters have been encoded. Nor for that matter is it 
accurate to describe it as historic.

Warm regards
John Knightley


On 2022-02-11 17:41, Johannes Bergerhausen via Unicode wrote:
> Dear list,
> 
> fyi: we have updated the worldswritingsystems.org [1] website to
> Unicode 14.0. Besides some corrections, there are also some new
> typographic reference glpyhs and a new FAQ page. If you spot a
> mistake, please send us a correction.
> 
> By our count, there are currently 294 known scripts, living or
> historical. 131 of them are not yet encoded in Unicode.
> 
> Many greetings,
> Johannes (Hochschule Mainz, Germany), Deborah (SEI Berkeley, USA),
> Thomas (ANRT Nancy, France)
> 
> Links:
> ------
> [1] http://worldswritingsystems.org


From wjgo_10009 at btinternet.com  Mon Feb 14 08:59:48 2022
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Mon, 14 Feb 2022 14:59:48 +0000 (GMT)
Subject: Recording accurately a person's name
Message-ID: <115916f1.b3e8.17ef8bf3958.Webtop.96@btinternet.com>

There was recently a Public Review.

434 CLDR Person Name Formatting

I sent in a response. My response and the result of reviewing by the 
subcommittee is available as follows.

https://unicode-org.atlassian.net/browse/CLDR-15263

However, it appears, from the response, that many of the issues that I 
mentioned are for implementers of software that use the standard.

The issue of some (though not all) people and organizations deciding to 
only use the first two initials of someone's given names, so, for 
example, with a name with three initials before the surname deciding to 
only use the first two when typing a letter from a longhand draft or 
replying to a letter goes back to before the widespread use of computers 
that exists today.

So, I write here, to a mailing list that is read by many people who 
implement software systems that include Unicode in some way, to ask 
please that when it comes to designing software that the widespread 
concept of only allowing for one "middle initial" is discontinued so 
that people with more than two given names are listed according to their 
name and not by some edited version of it that may, in fact, be the name 
of another person.

It seems to me that an application program needs a field that will 
accept more than one letter.

Also, when producing an address label, or an insurance certificate, or 
whatever, to not assume or action that only the first character of the 
given2 field is needed to be printed.

Also, a related issue, please allow for Name on Card for credit card and 
debit card transactions to be entered manually rather than deducing it 
from name data and presenting it in a "greyed-out cannot be altered" 
field, because Name on Card may or may not have a honorific and may have 
a combination of names in full and initials that is not congruently 
deducible from the data.

With this new standard being produced, the opportunity to get away from 
the widespread name truncation practice exists, please take the 
opportunity to do so.

Thank you.

William J. G. Overington

Monday 14 February 2022


From steffen at sdaoden.eu  Mon Feb 14 11:08:19 2022
From: steffen at sdaoden.eu (Steffen Nurpmeso)
Date: Mon, 14 Feb 2022 18:08:19 +0100
Subject: Recording accurately a person's name
In-Reply-To: <115916f1.b3e8.17ef8bf3958.Webtop.96@btinternet.com>
References: <115916f1.b3e8.17ef8bf3958.Webtop.96@btinternet.com>
Message-ID: <20220214170819.Mtm8a%steffen@sdaoden.eu>

William_J_G Overington via Unicode wrote in
 <115916f1.b3e8.17ef8bf3958.Webtop.96 at btinternet.com>:
 |There was recently a Public Review.
 |
 |434 CLDR Person Name Formatting

While totally off-topic i see CLDR and long wanted to report that
_all_ messages of the German CLDR forum were classified as spam by
GMail (including those of their own fellows).  I did not have one
in my regular mail folder.  (All the ones i later reviewed in my
spam folder where in english, just to mention it.)
I mean maybe it is fun, as WWF mails, and things like
comp.lang.awk at googlegroups digests and such is spam, too.  Hm.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

From monicamerchant1 at gmail.com  Thu Feb 17 06:18:56 2022
From: monicamerchant1 at gmail.com (Monica Merchant)
Date: Fri, 18 Feb 2022 01:18:56 +1300
Subject: Compatibility decomposables that are not compatibility characters
Message-ID: <CAGVe82TPwM-DeO+sKw-jhpVxUxjC3QC+eGNX+SE2B+BUixcH1g@mail.gmail.com>

Hello,

I have a question about the last two examples on the bottom of page 27
of Chapter
2.3 Compatibility Characters
<https://www.unicode.org/versions/Unicode14.0.0/ch02.pdf>:

*Example 1*

By way of contrast, some compatibility decomposable characters, such as
> modifier letters
> used in phonetic orthographies, for example, U+02B0 modifier letter small
> h, are not
> considered to be compatibility characters. They would have been accepted
> for encoding in
> the standard on their own merits, regardless of their need for mapping to
> IPA. A large
> number of compatibility decomposable characters like this are actually
> distinct symbols
> used in specialized notations, whether phonetic or mathematical. In such
> cases, their compatibility
> mappings express their historical derivation from styled forms of standard
> letters.


*Example 2*

Other compatibility decomposable characters are widely used characters
> serving essential
> functions. U+00A0 no-break space is one example. In these and similar
> cases, such as
> fixed-width space characters, the compatibility decompositions define
> possible fallback
> representations.


The first example illustrates the case where a *compatibility decomposable
character* is *not* a *compatibility character* (i.e. a character that
would not have been encoded except for round-tripping with a source
standard): The Spacing Modifier Letters (U+02B0-U+02FF) and Mathematical
Alphanumeric Symbols (U+1D400-U+1D7FF) are not compatibility characters
because, although they resemble rich text variants of ordinary letters,
they are actually distinct symbols and therefore would have been accepted
for encoding on their own merits (as opposed to being encoded solely for
round-tripping).

However, I'm confused by the second example. In particular, I'm not
sure if no-break
space (*U+00A0*) and the fixed-width space characters (*U+2000-U+200A*) are
compatibility characters or not. They are described as "serving essential
functions", which I read as meaning that they would have been encoded even
if it weren't for round-tripping, in which case they would not be
considered as compatibility characters. Is this correct? If so, are they
essential because they facilitate the typesetting of text-based markup like
HTML (where formatting must be specified in plain text)? No-break space is
also essential in that it is used to display standalone non-spacing marks (pg
267 <https://www.unicode.org/versions/Unicode14.0.0/ch06.pdf>).

I apologise if this is an obvious question and would be grateful for any
guidance, as most resources only mention compatibility characters in
passing.


Thank you,

Monica
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220218/9c903070/attachment.htm>

From cate at cateee.net  Thu Feb 17 07:52:32 2022
From: cate at cateee.net (Giacomo Catenazzi)
Date: Thu, 17 Feb 2022 14:52:32 +0100
Subject: Compatibility decomposables that are not compatibility characters
In-Reply-To: <CAGVe82TPwM-DeO+sKw-jhpVxUxjC3QC+eGNX+SE2B+BUixcH1g@mail.gmail.com>
References: <CAGVe82TPwM-DeO+sKw-jhpVxUxjC3QC+eGNX+SE2B+BUixcH1g@mail.gmail.com>
Message-ID: <e90e5bb2-967e-b843-8919-5a547fa53944@cateee.net>

Hello Monica,

On 17.02.2022 13:18, Monica Merchant via Unicode wrote:

> However, I'm confused by the?second example. In particular, I'm not sure 
> if no-break space (*U+00A0*)?and the fixed-width space characters 
> (*U+2000-U+200A*)?are compatibility characters or not. They are 
> described as "serving essential functions", which I read as meaning that 
> they would have been encoded even if it weren't for round-tripping,?in 
> which case they would not be considered as compatibility?characters. Is 
> this correct? If so, are they essential because they?facilitate the 
> typesetting of text-based markup like HTML (where formatting must be 
> specified in plain text)? No-break space is also essential in that it is 
> used to display standalone non-spacing marks (pg 267 
> <https://www.unicode.org/versions/Unicode14.0.0/ch06.pdf>).
> 

I read the section in this manner: the three examples before your 
example 1 and example 2 describe the case of compatibility characters 
that are not compatibility decomposable characters. Then the standard 
describe two examples where we have compatibility decomposition, but 
without being compatibility characters.

Note that on page 26 we have:

vvvv
There is no formal listing of all compatibility characters in the 
Unicode Standard. This follows from the nature of the definition of 
compatibility characters. It is a judgement call as to whether any 
particular character would have been accepted for encoding if it had not 
been required for interoperability with a particular standard. Different 
participants in character encoding often disagree about the 
appropriateness of encoding particular characters, and sometimes there 
are multiple justifications for encoding a given character.
^^^^

So it depends on how do you interpret U+00A0. As you write, you may 
consider essential distinction in HTML, so it may not be a compatibility 
character. On the other hand, a typesetter may interpret U+00A0 as 
U+0020. Such person will decide to break or not the space according the 
context (he know language rules and style, e.g. not to break number with 
units, "Ms." with the name, etc.). So the context, but not the character 
makes the distinction.

But your extra cases are more interesting.
U+2000 is canonical equivalent to U+2002 (EN QUAD vs EN SPACE). These 
not just have a compatibility decomposable character, but in my opinion 
they are also just compatibility characters: there are exactly the same 
character (there are included just because an error/wrong interpretation 
of existing documents). The same for U+2001.

I would consider U+2002 to U+200A without U+2007 also as compatibility 
characters (and Unicode Database considers them as compatibility 
decomposable characters). Probably Unicode do the same, because they 
have the type "<compat>".

It is just U+2007 (not just because like U+00A0 has a <NoBreak> instead 
of <compat>) that make me think. For me, this is just a decimal digit 
zero which it is not printed, so it has own merits: it is not a 
separation, but a meaningful character. (context: tables). Different 
people may have different opinions.

giacomo


> 
> 
> Thank you,
> 
> Monica
> 
> 

From asmusf at ix.netcom.com  Thu Feb 17 13:33:22 2022
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Thu, 17 Feb 2022 11:33:22 -0800
Subject: Compatibility decomposables that are not compatibility characters
In-Reply-To: <e90e5bb2-967e-b843-8919-5a547fa53944@cateee.net>
References: <CAGVe82TPwM-DeO+sKw-jhpVxUxjC3QC+eGNX+SE2B+BUixcH1g@mail.gmail.com>
 <e90e5bb2-967e-b843-8919-5a547fa53944@cateee.net>
Message-ID: <748b4a9e-3cb3-66fc-a334-7e43a32fb662@ix.netcom.com>

An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220217/c047e016/attachment.htm>

From kenwhistler at sonic.net  Thu Feb 17 19:32:57 2022
From: kenwhistler at sonic.net (Ken Whistler)
Date: Thu, 17 Feb 2022 17:32:57 -0800
Subject: Compatibility decomposables that are not compatibility characters
In-Reply-To: <e90e5bb2-967e-b843-8919-5a547fa53944@cateee.net>
References: <CAGVe82TPwM-DeO+sKw-jhpVxUxjC3QC+eGNX+SE2B+BUixcH1g@mail.gmail.com>
 <e90e5bb2-967e-b843-8919-5a547fa53944@cateee.net>
Message-ID: <703ad57b-3d7f-4b56-8221-a1c8876ad061@sonic.net>

In general, it is a good idea not to try to parse the discussion of 
compatibility characters too closely. That whole section of the core 
specification was written to help clarify the ambiguous, careless way 
that people were tending to wave around the term "compatibility 
character" in earlier days of the standard.

It is unfortunate that we ended up with the term "compatibility" used 
for a specific set of decomposition types baked into the data files and 
as a normative part of the Unicode Normalization Algorithm, but there we 
are. It just means that people need to be careful now when they evoke 
the *other* sense of "compatibility character" -- the shorthand usage 
for which is approximately "useless dreck we didn't really want to 
include in the standard but had to for one reason or another." That 
second use overlaps a lot with characters that formally have 
"compatibility decompositions", but the two sets are not the same -- 
hence the need for the explanation.

On 2/17/2022 5:52 AM, Giacomo Catenazzi via Unicode wrote:
> So it depends on how do you interpret U+00A0. As you write, you may 
> consider essential distinction in HTML, so it may not be a 
> compatibility character. On the other hand, a typesetter may interpret 
> U+00A0 as U+0020. Such person will decide to break or not the space 
> according the context (he know language rules and style, e.g. not to 
> break number with units, "Ms." with the name, etc.). So the context, 
> but not the character makes the distinction.

U+00A0 is a widely used, clearly necessary character. If it hadn't 
already been in significant character sets incorporated into the 
earliest drafts of the Unicode repertoire, the Unicode architects almost 
certainly would have invented it and added it in.

Now, from a certain point of view, characters added to Unicode 1.0 
because they were already encoded in ISO 8859-1 ("Latin-1") were added 
"for compatibility" with that earlier character set. That seems pretty 
obvious, because, for good reasons, U+0010..U+00FF were all added to 
Unicode in the exact same order and code values as for Latin-1. You 
don't get much more compatible than that! But at the time, nobody was 
really arguing that those were <airquote>compatibility 
characters</airquote>. It was assumed that we had to have all the 
Latin-1 characters in the standard. That was considered a no brainer at 
the time. None were "useless dreck". In fact, the big argument then was 
about the accented Latin letters in the range U+00C0..U+00FF, which 
ended up with *canonical* decompositions into their base letter + accent 
combinations. So those were canonical decomposibles, and not 
compatibility decomposibles, although quite arguably, they were encoded 
"for compatibility" with Latin-1.

See how slippery this gets?

By contrast, the archetypal examples at the time of "useless dreck" that 
were added as "compatibility characters" were the various ligatures in 
the Arabic Presentation Forms-A block and the Alphabetic Presentation 
Forms block. Those were all considered "compatibility characters" at the 
time, and were even quarantined in a range then known as the 
"Compatibility Area" in the code space.

>
> But your extra cases are more interesting.
> U+2000 is canonical equivalent to U+2002 (EN QUAD vs EN SPACE). These 
> not just have a compatibility decomposable character, but in my 
> opinion they are also just compatibility characters: there are exactly 
> the same character (there are included just because an error/wrong 
> interpretation of existing documents). The same for U+2001.
>
> I would consider U+2002 to U+200A without U+2007 also as compatibility 
> characters (and Unicode Database considers them as compatibility 
> decomposable characters). Probably Unicode do the same, because they 
> have the type "<compat>".
>
> It is just U+2007 (not just because like U+00A0 has a <NoBreak> 
> instead of <compat>) that make me think. For me, this is just a 
> decimal digit zero which it is not printed, so it has own merits: it 
> is not a separation, but a meaningful character. (context: tables). 
> Different people may have different opinions.

The fixed-width spaces in the 2000 block of punctuation have their own 
interesting history. The fact that they were added in Unicode 1.0 means 
that they were not part of the forced merger with 10646 repertoire in 
1992 that led to the Arabic ligatures and the like. Instead, they 
derived largely from the pre-existing XCCS (Xerox) character set, but 
some of them appeared also in other early character sets. In Unicode 1.0 
they had no decompositions -- nothing did. The decompositions were first 
added in Unicode 1.1, and at that point they were all tagged as "<font 
variant> [0020]". That was the beginning of the realization that most of 
the fixed-width space characters didn't really belong in plain text for 
interchange, but instead were artifacts of printing technology.

The addition of the *canonical* decompositions for 2000 and 2001 was a 
Unicode 2.0 innovation, when it became clear that nobody could come up 
with a convincing distinction between an "EM QUAD" as a space character 
and an "EM SPACE" as a space character.

Nowadays most people would agree that there would be little reason to 
put any of those other than 200B ZWSP and 2007 FIGURE SPACE into a plain 
text stream. The rest of the fixed width space characters are basically 
"useless dreck", but the interesting distinction here is that they 
didn't start out being considered to be compatibility characters, but 
rather graduated to that status as people came to appreciate the fact 
that there weren't valid reasons to use them in modern Unicode text 
representation. They aren't bad enough to be formally deprecated, but 
they live in a kind of limbo of useless stuff you'd be better off 
without, along with scads of other such artifacts in the standard.

--Ken

>
>

From hubaishan at outlook.sa  Thu Feb 17 22:44:17 2022
From: hubaishan at outlook.sa (Saeed Hubaishan)
Date: Fri, 18 Feb 2022 04:44:17 +0000
Subject: Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4)
Message-ID: <BY5PR02MB6962BE312A1B0952D273BF5FCC379@BY5PR02MB6962.namprd02.prod.outlook.com>

Hi,
"The Decomposition Type Mapping"  of these ligature marks are worng:
FC5E     ???    Arabic Ligature Shadda With Dammatan Isolated Form
                ?       <isolated> 0020 ? 064C ?? 0651 ??
FC5F     ???    Arabic Ligature Shadda With Kasratan Isolated Form
                ?       <isolated> 0020 ? 064D ?? 0651 ??
FC60     ???    Arabic Ligature Shadda With Fatha Isolated Form
                ?       <isolated> 0020 ? 064E ?? 0651 ??
FC61     ???    Arabic Ligature Shadda With Damma Isolated Form
                ?       <isolated> 0020 ? 064F ?? 0651 ??
FC62     ???    Arabic Ligature Shadda With Kasra Isolated Form
                ?       <isolated> 0020 ? 0650 ?? 0651 ??

FCF2     ???    Arabic Ligature Shadda With Fatha Medial Form
                ?       <medial> 0640 ??? 064E ?? 0651 ??
FCF3     ???    Arabic Ligature Shadda With Damma Medial Form
                ?       <medial> 0640 ??? 064F ?? 0651 ??
FCF4     ???    Arabic Ligature Shadda With Kasra Medial Form
                ?       <medial> 0640 ??? 0650 ?? 0651 ??
Arabic Shadda must be before the marks (064C ?? ,064D ?? , 064E ?? , 064F ?? ,  0650 ??)


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220218/48ac6dcd/attachment.htm>

From wjgo_10009 at btinternet.com  Fri Feb 18 05:36:33 2022
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Fri, 18 Feb 2022 11:36:33 +0000 (GMT)
Subject: Simulating the handsetting of metal type (from Re: Compatibility
 decomposables that are not compatibility characters)
In-Reply-To: <703ad57b-3d7f-4b56-8221-a1c8876ad061@sonic.net>
References: <CAGVe82TPwM-DeO+sKw-jhpVxUxjC3QC+eGNX+SE2B+BUixcH1g@mail.gmail.com>
 <e90e5bb2-967e-b843-8919-5a547fa53944@cateee.net>
 <703ad57b-3d7f-4b56-8221-a1c8876ad061@sonic.net>
Message-ID: <18640b1.26e2.17f0c9e946e.Webtop.96@btinternet.com>


https://forum.affinity.serif.com/index.php?/topic/157455-simulating-the-handsetting-of-metal-type/

William Overington

Friday 18 February 2022


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220218/c98e6d45/attachment.htm>

From sosipiuk at gmail.com  Fri Feb 18 11:38:48 2022
From: sosipiuk at gmail.com (=?UTF-8?Q?S=C5=82awomir_Osipiuk?=)
Date: Fri, 18 Feb 2022 12:38:48 -0500
Subject: Compatibility decomposables that are not compatibility characters
In-Reply-To: <703ad57b-3d7f-4b56-8221-a1c8876ad061@sonic.net>
References: <CAGVe82TPwM-DeO+sKw-jhpVxUxjC3QC+eGNX+SE2B+BUixcH1g@mail.gmail.com>
 <e90e5bb2-967e-b843-8919-5a547fa53944@cateee.net>
 <703ad57b-3d7f-4b56-8221-a1c8876ad061@sonic.net>
Message-ID: <CAM+ijLgccfSaQx6mnr27J5WygrJN3RH8dumHjKZaYo2M5NdQdA@mail.gmail.com>

On Thu, Feb 17, 2022 at 8:36 PM Ken Whistler via Unicode
<unicode at corp.unicode.org> wrote:
>
> The addition of the *canonical* decompositions for 2000 and 2001 was a
> Unicode 2.0 innovation, when it became clear that nobody could come up
> with a convincing distinction between an "EM QUAD" as a space character
> and an "EM SPACE" as a space character.

While following a different trail a couple of weeks ago I came upon
this proposal:
http://www.unicode.org/L2/L2019/19115-fwsp-usability.pdf

While the proposal itself is a non-starter due to stability reqs,
Marcel Schneider makes the case that the QUADs were originally meant
to allow line breaking, while the adjacent SPACE characters should
have been non-breaking. That would have been the "convincing
distinction", if it had been implemented that way.

S?awomir Osipiuk


From kenwhistler at sonic.net  Fri Feb 18 13:44:09 2022
From: kenwhistler at sonic.net (Ken Whistler)
Date: Fri, 18 Feb 2022 11:44:09 -0800
Subject: Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4)
In-Reply-To: <BY5PR02MB6962BE312A1B0952D273BF5FCC379@BY5PR02MB6962.namprd02.prod.outlook.com>
References: <BY5PR02MB6962BE312A1B0952D273BF5FCC379@BY5PR02MB6962.namprd02.prod.outlook.com>
Message-ID: <bb87d53f-3e2b-a827-d51b-60044701728a@sonic.net>


On 2/17/2022 8:44 PM, Saeed Hubaishan via Unicode wrote:
> Hi,
> "The Decomposition Type Mapping"? of these ligature marks are worng:
> |FC5E| 	???? 	Arabic Ligature Shadda With Dammatan Isolated Form
> 		? 	<isolated> |0020| ? |064C|??? |0651|???
> |FC5F| 	???? 	Arabic Ligature Shadda With Kasratan Isolated Form
> 		? 	<isolated> |0020| ? |064D|??? |0651|???
>
>
...
> Arabic Shadda must be before the marks (||064C|??? ,|064D|??? , 
> |064E|???| ,|064F|??? ||, |0650|???)
>
Decompositions are immutable, constrained by normalization stability.


To see how such rendering should be handled, instead, please see Unicode 
Technical Report #53, Unicode Arabic Mark Rendering, which addresses the 
issue of the placement of shadda, along with many other issues of 
ordering and placement of various tashkil, ijam, and other marks:


https://www.unicode.org/reports/tr53/


--Ken
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220218/32d6c0d0/attachment.htm>

From mark at kli.org  Fri Feb 18 13:46:13 2022
From: mark at kli.org (Mark E. Shoulson)
Date: Fri, 18 Feb 2022 14:46:13 -0500
Subject: Kirai Rat Decompositions, was Re: Compatibility decomposables that
 are not compatibility characters
In-Reply-To: <CAM+ijLgccfSaQx6mnr27J5WygrJN3RH8dumHjKZaYo2M5NdQdA@mail.gmail.com>
References: <CAGVe82TPwM-DeO+sKw-jhpVxUxjC3QC+eGNX+SE2B+BUixcH1g@mail.gmail.com>
 <e90e5bb2-967e-b843-8919-5a547fa53944@cateee.net>
 <703ad57b-3d7f-4b56-8221-a1c8876ad061@sonic.net>
 <CAM+ijLgccfSaQx6mnr27J5WygrJN3RH8dumHjKZaYo2M5NdQdA@mail.gmail.com>
Message-ID: <34c1a873-25c2-8c66-6f59-278d95341eb1@shoulson.com>

Perhaps relevant to this thread, I was just reading in 
https://www.unicode.org/L2/L2022/22043-kirat-rai.pdf L2/22-043, proposal 
to encode Kirai Rat Script, where it remarks regarding the vowels:

> These should all be encoded atomically. This is because linguistically 
> these vowels are not composed of two separatecharacters, they are 
> single vowels in their own right. It is true that the custom encoded 
> Kirat Rai font uses decomposedvowel signs as a matter of expediency, 
> but this decision should not influence the right way to encode the 
> script.Because the glyph for some of the vowels (aa and e) are part of 
> the shape of the last 3 vowels (ai, o, au) there shouldbe canonical 
> decompositions for the last 3 vowels. With these decompositions, Do 
> Not Use tables are not necessary.
If the vowels are to be encoded atomically, and it sounds like they 
should be, shouldn't we *not* want to have canonical decompositions for 
them?? I thought Unicode was trying to avoid precomposed characters at 
this point.? I guess it's too late to hope for "only one right way to 
spell it" out of Unicode, but is that still something we try to 
approach?? It almost seems to me that canonical decompositions also stem 
from cases of "things that wouldn't be encoded if they were proposed 
now," and if so it would not really make sense to propose anything with 
a canonical decomposition.? Or am I misunderstanding the attitude 
towards canonical decompositions, or the proposal's statement?

~mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220218/198540e0/attachment.htm>

From richard.wordingham at ntlworld.com  Fri Feb 18 13:48:28 2022
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Fri, 18 Feb 2022 19:48:28 +0000
Subject: Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4)
In-Reply-To: <BY5PR02MB6962BE312A1B0952D273BF5FCC379@BY5PR02MB6962.namprd02.prod.outlook.com>
References: <BY5PR02MB6962BE312A1B0952D273BF5FCC379@BY5PR02MB6962.namprd02.prod.outlook.com>
Message-ID: <20220218194828.26235c8d@JRWUBU2>

On Fri, 18 Feb 2022 04:44:17 +0000
Saeed Hubaishan via Unicode <unicode at corp.unicode.org> wrote:

> Hi,
> "The Decomposition Type Mapping"  of these ligature marks are worng:
> FC5E     ???    Arabic Ligature Shadda With Dammatan Isolated Form
>                 ?       <isolated> 0020 ? 064C ?? 0651 ??
> FC5F     ???    Arabic Ligature Shadda With Kasratan Isolated Form
>                 ?       <isolated> 0020 ? 064D ?? 0651 ??
> FC60     ???    Arabic Ligature Shadda With Fatha Isolated Form
>                 ?       <isolated> 0020 ? 064E ?? 0651 ??
> FC61     ???    Arabic Ligature Shadda With Damma Isolated Form
>                 ?       <isolated> 0020 ? 064F ?? 0651 ??
> FC62     ???    Arabic Ligature Shadda With Kasra Isolated Form
>                 ?       <isolated> 0020 ? 0650 ?? 0651 ??
> 
> FCF2     ???    Arabic Ligature Shadda With Fatha Medial Form
>                 ?       <medial> 0640 ??? 064E ?? 0651 ??
> FCF3     ???    Arabic Ligature Shadda With Damma Medial Form
>                 ?       <medial> 0640 ??? 064F ?? 0651 ??
> FCF4     ???    Arabic Ligature Shadda With Kasra Medial Form
>                 ?       <medial> 0640 ??? 0650 ?? 0651 ??
> Arabic Shadda must be before the marks (064C ?? ,064D ?? , 064E ?? ,
> 064F ?? ,  0650 ??)

But they and shadda have different non-zero canonical combining classes
(ccc), so their order shall intend no difference.  Shadda has the higher
ccc, so it comes last.  Putting it last makes the decomposition table
easier to use for conversion to form NFKD.

Richard.


From doug at ewellic.org  Fri Feb 18 15:37:50 2022
From: doug at ewellic.org (Doug Ewell)
Date: Fri, 18 Feb 2022 14:37:50 -0700
Subject: Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4)
In-Reply-To: <BY5PR02MB6962BE312A1B0952D273BF5FCC379@BY5PR02MB6962.namprd02.prod.outlook.com>
References: <BY5PR02MB6962BE312A1B0952D273BF5FCC379@BY5PR02MB6962.namprd02.prod.outlook.com>
Message-ID: <007701d8250f$c774fe20$565efa60$@ewellic.org>

Saeed Hubaishan wrote:

> "The Decomposition Type Mapping"  of these ligature marks are worng:

Comments like these always make me wonder what motivated them.

The vast majority of characters in the Arabic Presentation Forms-A and -B blocks should not be used. They exist for compatibility with older platforms that did not implement proper Arabic shaping and directionality. Instead, use normal Arabic letters from the regular Arabic, Arabic Supplement, Extended-A, or Extended-B blocks.

--
Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org


From richard.wordingham at ntlworld.com  Fri Feb 18 16:06:40 2022
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Fri, 18 Feb 2022 22:06:40 +0000
Subject: Kirai Rat Decompositions, was Re: Compatibility decomposables
 that are not compatibility characters
In-Reply-To: <34c1a873-25c2-8c66-6f59-278d95341eb1@shoulson.com>
References: <CAGVe82TPwM-DeO+sKw-jhpVxUxjC3QC+eGNX+SE2B+BUixcH1g@mail.gmail.com>
 <e90e5bb2-967e-b843-8919-5a547fa53944@cateee.net>
 <703ad57b-3d7f-4b56-8221-a1c8876ad061@sonic.net>
 <CAM+ijLgccfSaQx6mnr27J5WygrJN3RH8dumHjKZaYo2M5NdQdA@mail.gmail.com>
 <34c1a873-25c2-8c66-6f59-278d95341eb1@shoulson.com>
Message-ID: <20220218220640.3dd190dc@JRWUBU2>

On Fri, 18 Feb 2022 14:46:13 -0500
"Mark E. Shoulson via Unicode" <unicode at corp.unicode.org> wrote:

> Perhaps relevant to this thread, I was just reading in 
> https://www.unicode.org/L2/L2022/22043-kirat-rai.pdf L2/22-043,
> proposal to encode Kirai Rat Script, where it remarks regarding the
> vowels:
> 
> > These should all be encoded atomically. This is because
> > linguistically these vowels are not composed of two
> > separatecharacters, they are single vowels in their own right. It
> > is true that the custom encoded Kirat Rai font uses decomposedvowel
> > signs as a matter of expediency, but this decision should not
> > influence the right way to encode the script.Because the glyph for
> > some of the vowels (aa and e) are part of the shape of the last 3
> > vowels (ai, o, au) there shouldbe canonical decompositions for the
> > last 3 vowels. With these decompositions, Do Not Use tables are not
> > necessary.  
> If the vowels are to be encoded atomically, and it sounds like they 
> should be, shouldn't we *not* want to have canonical decompositions
> for them?? I thought Unicode was trying to avoid precomposed
> characters at this point.? I guess it's too late to hope for "only
> one right way to spell it" out of Unicode, but is that still
> something we try to approach?? It almost seems to me that canonical
> decompositions also stem from cases of "things that wouldn't be
> encoded if they were proposed now," and if so it would not really
> make sense to propose anything with a canonical decomposition.? Or am
> I misunderstanding the attitude towards canonical decompositions, or
> the proposal's statement?

X technology should obviously be opposed wherever possible.  We should
make it impossible to enter these vowel symbols at a a single stroke
when using a simple X keyboard or even an MSKLC keyboard creator.  We
must keep professional keyboard writers in work. 

Your wording is confusing.  There are several different options:

1) Only allow encoding for single vowels (the Khmer model)
2) Do not encode visually compound vowels (the Myanmar model)
3) Allow visually compound vowels as sequences or as single characters
(the south Indian model)

The proposal argues for (3), which rather assumes that canonical
equivalence will be taken seriously.  At least we don't have the
problem presented by doubled multipart south Indian vowels.

Model (1) calls forth a need for stop lists, and potential confusion
when a compound vowel notation is later found to be needed.  (From
the Southern Thai point of view, there seems to be a vowel missing from
the Khmer script which it would be very tempting to just encode as
<U+17C1, U+17B7>, though in *Khmer* usage it is arguably just a glyph
variant of U+17BE KHMER VOWEL SIGN OE.) 

I think you're calling for (2), which with current technology seems to
make keyboard creation unduly complicated or fragile if we want users
to be able to treat KIRAT RAI VOWEL SIGN O as a single entity.  (Do
users have such a perception?  We'll probably be told that it's not a
user-perceived character.)

Richard.


From richard.wordingham at ntlworld.com  Fri Feb 18 17:24:01 2022
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Fri, 18 Feb 2022 23:24:01 +0000
Subject: Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4)
In-Reply-To: <007701d8250f$c774fe20$565efa60$@ewellic.org>
References: <BY5PR02MB6962BE312A1B0952D273BF5FCC379@BY5PR02MB6962.namprd02.prod.outlook.com>
 <007701d8250f$c774fe20$565efa60$@ewellic.org>
Message-ID: <20220218232401.4818e7e7@JRWUBU2>

On Fri, 18 Feb 2022 14:37:50 -0700
Doug Ewell via Unicode <unicode at corp.unicode.org> wrote:

> The vast majority of characters in the Arabic Presentation Forms-A
> and -B blocks should not be used. They exist for compatibility with
> older platforms that did not implement proper Arabic shaping and
> directionality. Instead, use normal Arabic letters from the regular
> Arabic, Arabic Supplement, Extended-A, or Extended-B blocks.

Irritatingly, I had to use some of these characters just this week
because the shaping in Arabic fonts for basic installations of Windows
10 and Ubuntu didn't include the ligatures we were discussing - in
particular that of U+FCCA ARABIC LIGATURE LAM WITH HAH INITIAL FORM.
(The ligature was germane to the discussion.)  Many of the ligatures are
not essential for proper shaping. I've now found and lawfully installed
a font that gives me the ligature from normal Arabic letters.

Richard.

From eliz at gnu.org  Sat Feb 19 01:38:22 2022
From: eliz at gnu.org (Eli Zaretskii)
Date: Sat, 19 Feb 2022 09:38:22 +0200
Subject: Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4)
In-Reply-To: <20220218232401.4818e7e7@JRWUBU2> (message from Richard
 Wordingham via Unicode on Fri, 18 Feb 2022 23:24:01 +0000)
References: <BY5PR02MB6962BE312A1B0952D273BF5FCC379@BY5PR02MB6962.namprd02.prod.outlook.com>
 <007701d8250f$c774fe20$565efa60$@ewellic.org>
 <20220218232401.4818e7e7@JRWUBU2>
Message-ID: <83sfsfz4pd.fsf@gnu.org>

> Date: Fri, 18 Feb 2022 23:24:01 +0000
> From: Richard Wordingham via Unicode <unicode at corp.unicode.org>
> 
> On Fri, 18 Feb 2022 14:37:50 -0700
> Doug Ewell via Unicode <unicode at corp.unicode.org> wrote:
> 
> > The vast majority of characters in the Arabic Presentation Forms-A
> > and -B blocks should not be used. They exist for compatibility with
> > older platforms that did not implement proper Arabic shaping and
> > directionality. Instead, use normal Arabic letters from the regular
> > Arabic, Arabic Supplement, Extended-A, or Extended-B blocks.
> 
> Irritatingly, I had to use some of these characters just this week
> because the shaping in Arabic fonts for basic installations of Windows
> 10 and Ubuntu didn't include the ligatures we were discussing - in
> particular that of U+FCCA ARABIC LIGATURE LAM WITH HAH INITIAL FORM.
> (The ligature was germane to the discussion.)  Many of the ligatures are
> not essential for proper shaping. I've now found and lawfully installed
> a font that gives me the ligature from normal Arabic letters.

Which font is that, please?

And does anyone here know why the Courier New font on Windows XP does
produce the ligature from those two characters, but the same font on
Windows 10 doesn't?  Is this ligature somehow deemed inappropriate or
problematic?  I'm not asking about U+FCCA, I'm asking about the
display of the two characters U+0644 and U+062D -- should it ligate or
shouldn't it?

Thanks.

From hubaishan at outlook.sa  Sat Feb 19 04:20:31 2022
From: hubaishan at outlook.sa (Saeed Hubaishan)
Date: Sat, 19 Feb 2022 10:20:31 +0000
Subject: =?utf-8?B?2LHYrzogV3Jvbmcgc2VxdWVuY2UgZm9yIEFyYWJpYyBsaWdhdHVyZSBtYXJr?=
 =?utf-8?Q?s(FC5E-FC62,_FCF2-FCF4)?=
In-Reply-To: <20220218194828.26235c8d@JRWUBU2>
References: <BY5PR02MB6962BE312A1B0952D273BF5FCC379@BY5PR02MB6962.namprd02.prod.outlook.com>
 <20220218194828.26235c8d@JRWUBU2>
Message-ID: <BY5PR02MB6962983FE0B531BD469E42A8CC389@BY5PR02MB6962.namprd02.prod.outlook.com>

But we have a problem with some program whom get thier data from unicode like "MediaWiki" and "phpBB" they reorder
???
to
???
with maybe rendered in some old windows fonts  like
???

you can try this with wikipedia
________________________________
??: ??Unicode <unicode-bounces at corp.unicode.org> ???????? ?? Richard Wordingham via Unicode <unicode at corp.unicode.org>
???? ???????: 18 ??????, 2022 10:48 ?
???: unicode at corp.unicode.org <unicode at corp.unicode.org>
?????????: Re: Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4)

On Fri, 18 Feb 2022 04:44:17 +0000
Saeed Hubaishan via Unicode <unicode at corp.unicode.org> wrote:

> Hi,
> "The Decomposition Type Mapping"  of these ligature marks are worng:
> FC5E     ???    Arabic Ligature Shadda With Dammatan Isolated Form
>                 ?       <isolated> 0020 ? 064C ?? 0651 ??
> FC5F     ???    Arabic Ligature Shadda With Kasratan Isolated Form
>                 ?       <isolated> 0020 ? 064D ?? 0651 ??
> FC60     ???    Arabic Ligature Shadda With Fatha Isolated Form
>                 ?       <isolated> 0020 ? 064E ?? 0651 ??
> FC61     ???    Arabic Ligature Shadda With Damma Isolated Form
>                 ?       <isolated> 0020 ? 064F ?? 0651 ??
> FC62     ???    Arabic Ligature Shadda With Kasra Isolated Form
>                 ?       <isolated> 0020 ? 0650 ?? 0651 ??
>
> FCF2     ???    Arabic Ligature Shadda With Fatha Medial Form
>                 ?       <medial> 0640 ??? 064E ?? 0651 ??
> FCF3     ???    Arabic Ligature Shadda With Damma Medial Form
>                 ?       <medial> 0640 ??? 064F ?? 0651 ??
> FCF4     ???    Arabic Ligature Shadda With Kasra Medial Form
>                 ?       <medial> 0640 ??? 0650 ?? 0651 ??
> Arabic Shadda must be before the marks (064C ?? ,064D ?? , 064E ?? ,
> 064F ?? ,  0650 ??)

But they and shadda have different non-zero canonical combining classes
(ccc), so their order shall intend no difference.  Shadda has the higher
ccc, so it comes last.  Putting it last makes the decomposition table
easier to use for conversion to form NFKD.

Richard.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220219/a4c79d2a/attachment.htm>

From richard.wordingham at ntlworld.com  Sat Feb 19 06:52:37 2022
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Sat, 19 Feb 2022 12:52:37 +0000
Subject: =?UTF-8?B?2LHYrzo=?= Wrong sequence for Arabic ligature
 marks(FC5E-FC62, FCF2-FCF4)
In-Reply-To: <BY5PR02MB6962983FE0B531BD469E42A8CC389@BY5PR02MB6962.namprd02.prod.outlook.com>
References: <BY5PR02MB6962BE312A1B0952D273BF5FCC379@BY5PR02MB6962.namprd02.prod.outlook.com>
 <20220218194828.26235c8d@JRWUBU2>
 <BY5PR02MB6962983FE0B531BD469E42A8CC389@BY5PR02MB6962.namprd02.prod.outlook.com>
Message-ID: <20220219125237.503aca31@JRWUBU2>

On Sat, 19 Feb 2022 10:20:31 +0000
Saeed Hubaishan via Unicode <unicode at corp.unicode.org> wrote:

> But we have a problem with some program whom get thier data from
> unicode like "MediaWiki" and "phpBB" they reorder ???
> to
> ???
In codepoints, <U+0644 LAM, U+0651 SHADDA, U+064E FATHA> to <U+0644,
U+064E, U+0651>. No process compliant with Unicode shall *deliberately*
render them differently - the sequences are canonically equivalent.

> with maybe rendered in some old windows fonts  like
> ???
> 
> you can try this with wikipedia

This sequence is <U+644, U+651, U+650 KASRA>, which is not canonically
normalised. Using the Naskh font Amiri, kasra is by default placed below
lam. However, if I enable OpenType feature ss05, which for this font is
described (unless the labels have been scrambled) as "Kasra is placed
below Shadda instead of base glyph", the kasra is indeed placed
immediately below the shadda. Unicode allows both renderings.

I'm not sure that Unicode provides any plain text mechanism to
distinguish the two renderings.

In answer to Eli, the Amiri font is the one I downloaded to
get LAM and HAH to automatically ligate; I got it from Ubuntu package
fonts-hosny-amiri.  The font is published under the SIL Open Font
Licence.

Richard.


From richard.wordingham at ntlworld.com  Sat Feb 19 07:05:44 2022
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Sat, 19 Feb 2022 13:05:44 +0000
Subject: Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4)
In-Reply-To: <83sfsfz4pd.fsf@gnu.org>
References: <BY5PR02MB6962BE312A1B0952D273BF5FCC379@BY5PR02MB6962.namprd02.prod.outlook.com>
 <007701d8250f$c774fe20$565efa60$@ewellic.org>
 <20220218232401.4818e7e7@JRWUBU2> <83sfsfz4pd.fsf@gnu.org>
Message-ID: <20220219130544.0e2d4cb7@JRWUBU2>

On Sat, 19 Feb 2022 09:38:22 +0200
Eli Zaretskii via Unicode <unicode at corp.unicode.org> wrote:

> > Date: Fri, 18 Feb 2022 23:24:01 +0000
> > From: Richard Wordingham via Unicode <unicode at corp.unicode.org>

> > Irritatingly, I had to use some of these characters just this week
> > because the shaping in Arabic fonts for basic installations of
> > Windows 10 and Ubuntu didn't include the ligatures we were
> > discussing - in particular that of U+FCCA ARABIC LIGATURE LAM WITH
> > HAH INITIAL FORM. (The ligature was germane to the discussion.)
> > Many of the ligatures are not essential for proper shaping. I've
> > now found and lawfully installed a font that gives me the ligature
> > from normal Arabic letters.  
> 
> Which font is that, please?

Amiri.

> And does anyone here know why the Courier New font on Windows XP does
> produce the ligature from those two characters, but the same font on
> Windows 10 doesn't?  Is this ligature somehow deemed inappropriate or
> problematic?  I'm not asking about U+FCCA, I'm asking about the
> display of the two characters U+0644 and U+062D -- should it ligate or
> shouldn't it?

Well, as Courier New is generally seen as a plain 'typewriter' font,
such ligatures would seem out of place in a font of that name.  One can
find claims that the only compulsory ligature is lam-alif.

Richard.

From hubaishan at outlook.sa  Sat Feb 19 07:30:49 2022
From: hubaishan at outlook.sa (Saeed Hubaishan)
Date: Sat, 19 Feb 2022 13:30:49 +0000
Subject: =?windows-1256?Q?=D1=CF:_=D1=CF:_Wrong_sequence_for_Arabic_ligature_marks?=
 =?windows-1256?Q?(FC5E-FC62,_FCF2-FCF4)?=
In-Reply-To: <20220219125237.503aca31@JRWUBU2>
References: <BY5PR02MB6962BE312A1B0952D273BF5FCC379@BY5PR02MB6962.namprd02.prod.outlook.com>
 <20220218194828.26235c8d@JRWUBU2>
 <BY5PR02MB6962983FE0B531BD469E42A8CC389@BY5PR02MB6962.namprd02.prod.outlook.com>
 <20220219125237.503aca31@JRWUBU2>
Message-ID: <BY5PR02MB6962B526A930EC945C149BD4CC389@BY5PR02MB6962.namprd02.prod.outlook.com>

See how some fonts in windows render FATHA + SHADDA in Pic
________________________________
??: ??Unicode <unicode-bounces at corp.unicode.org> ???????? ?? Richard Wordingham via Unicode <unicode at corp.unicode.org>
???? ???????: 19 ??????, 2022 03:52 ?
???: unicode at corp.unicode.org <unicode at corp.unicode.org>
?????????: Re: ??: Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4)

On Sat, 19 Feb 2022 10:20:31 +0000
Saeed Hubaishan via Unicode <unicode at corp.unicode.org> wrote:

> But we have a problem with some program whom get thier data from
> unicode like "MediaWiki" and "phpBB" they reorder ???
> to
> ???
In codepoints, <U+0644 LAM, U+0651 SHADDA, U+064E FATHA> to <U+0644,
U+064E, U+0651>. No process compliant with Unicode shall *deliberately*
render them differently - the sequences are canonically equivalent.

> with maybe rendered in some old windows fonts  like
> ???
>
> you can try this with wikipedia

This sequence is <U+644, U+651, U+650 KASRA>, which is not canonically
normalised. Using the Naskh font Amiri, kasra is by default placed below
lam. However, if I enable OpenType feature ss05, which for this font is
described (unless the labels have been scrambled) as "Kasra is placed
below Shadda instead of base glyph", the kasra is indeed placed
immediately below the shadda. Unicode allows both renderings.

I'm not sure that Unicode provides any plain text mechanism to
distinguish the two renderings.

In answer to Eli, the Amiri font is the one I downloaded to
get LAM and HAH to automatically ligate; I got it from Ubuntu package
fonts-hosny-amiri.  The font is published under the SIL Open Font
Licence.

Richard.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220219/83963e44/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Fatha shddah.png
Type: image/png
Size: 28535 bytes
Desc: Fatha shddah.png
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220219/83963e44/attachment.png>

From richard.wordingham at ntlworld.com  Sat Feb 19 11:01:41 2022
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Sat, 19 Feb 2022 17:01:41 +0000
Subject: =?UTF-8?B?2LHYrzog2LHYrzo=?= Wrong sequence for Arabic ligature
 marks(FC5E-FC62, FCF2-FCF4)
In-Reply-To: <BY5PR02MB6962B526A930EC945C149BD4CC389@BY5PR02MB6962.namprd02.prod.outlook.com>
References: <BY5PR02MB6962BE312A1B0952D273BF5FCC379@BY5PR02MB6962.namprd02.prod.outlook.com>
 <20220218194828.26235c8d@JRWUBU2>
 <BY5PR02MB6962983FE0B531BD469E42A8CC389@BY5PR02MB6962.namprd02.prod.outlook.com>
 <20220219125237.503aca31@JRWUBU2>
 <BY5PR02MB6962B526A930EC945C149BD4CC389@BY5PR02MB6962.namprd02.prod.outlook.com>
Message-ID: <20220219170141.48adf7dc@JRWUBU2>

On Sat, 19 Feb 2022 13:30:49 +0000
Saeed Hubaishan via Unicode <unicode at corp.unicode.org> wrote:

> See how some fonts in windows render FATHA + SHADDA in Pic

Well, the renderings are wrong.  Whether the problem is in the
application, the rendering engine or the font is less clear.  Peter
Constable recently opined that a font should work with all canonical
equivalents, which is a bit harsh given that OpenType lookups were
designed on the assumption that fonts would not have to reorder
characters.

Which application were you using, and what version of Windows?  What
fonts?  Were they designed for Uniscribe/DirectWrite, or were they
designed for HarfBuzz?  As HarfBuzz expressly aims to render canonical
equivalents the same, it is quite possible that the fonts used were
designed expecting the rendering engine to do the AMRTA processing that
Ken Whistler referred to earlier, and that they would work with the
HarfBuzz renderer, which on Windows is used in MS Edge, Chrome, Firefox
and LibreOffice.

Richard.


From aprilop at freenet.de  Sat Feb 19 11:38:11 2022
From: aprilop at freenet.de (Andreas Prilop)
Date: Sat, 19 Feb 2022 18:38:11 +0100
Subject: Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4)
In-Reply-To: <20220219125237.503aca31@JRWUBU2>
References: <BY5PR02MB6962BE312A1B0952D273BF5FCC379@BY5PR02MB6962.namprd02.prod.outlook.com>
 <20220218194828.26235c8d@JRWUBU2>
 <BY5PR02MB6962983FE0B531BD469E42A8CC389@BY5PR02MB6962.namprd02.prod.outlook.com>
 <20220219125237.503aca31@JRWUBU2>
Message-ID: <25E9B4E3-CFB0-4B05-B88D-1621CF507171@freenet.de>

On 19 February 2022 13:52:37 CET, Richard Wordingham wrote:

> This sequence is <U+644, U+651, U+650 KASRA>, which is not canonically
> normalised. Using the Naskh font Amiri, kasra is by default placed below
> lam. However, if I enable OpenType feature ss05, which for this font is
> described (unless the labels have been scrambled) as "Kasra is placed
> below Shadda instead of base glyph", the kasra is indeed placed
> immediately below the shadda. Unicode allows both renderings.
> I'm not sure that Unicode provides any plain text mechanism to
> distinguish the two renderings.

Write ZWNJ between shadda and kasra.

 <h1>&#x644;&#x651;&zwnj;&#x650;</h1>


From aprilop at freenet.de  Sat Feb 19 12:18:02 2022
From: aprilop at freenet.de (Andreas Prilop)
Date: Sat, 19 Feb 2022 19:18:02 +0100
Subject: Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4)I
In-Reply-To: <25E9B4E3-CFB0-4B05-B88D-1621CF507171@freenet.de>
References: <BY5PR02MB6962BE312A1B0952D273BF5FCC379@BY5PR02MB6962.namprd02.prod.outlook.com>
 <20220218194828.26235c8d@JRWUBU2>
 <BY5PR02MB6962983FE0B531BD469E42A8CC389@BY5PR02MB6962.namprd02.prod.outlook.com>
 <20220219125237.503aca31@JRWUBU2>
 <25E9B4E3-CFB0-4B05-B88D-1621CF507171@freenet.de>
Message-ID: <E4E26714-B9EF-4FF3-8AF1-11313BD679EB@freenet.de>

On 19 February 2022 18:38:11 CET, I wrote:

> Write ZWNJ between shadda and kasra.
> 
>  <h1>&#x644;&#x651;&zwnj;&#x650;</h1>

It is strange that ?&zwnj;? disappeared on

https://corp.unicode.org/pipermail/unicode/2022-February/009965.html


From richard.wordingham at ntlworld.com  Sat Feb 19 14:09:36 2022
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Sat, 19 Feb 2022 20:09:36 +0000
Subject: Wrong sequence for Arabic ligature marks(FC5E-FC62, FCF2-FCF4)I
In-Reply-To: <E4E26714-B9EF-4FF3-8AF1-11313BD679EB@freenet.de>
References: <BY5PR02MB6962BE312A1B0952D273BF5FCC379@BY5PR02MB6962.namprd02.prod.outlook.com>
 <20220218194828.26235c8d@JRWUBU2>
 <BY5PR02MB6962983FE0B531BD469E42A8CC389@BY5PR02MB6962.namprd02.prod.outlook.com>
 <20220219125237.503aca31@JRWUBU2>
 <25E9B4E3-CFB0-4B05-B88D-1621CF507171@freenet.de>
 <E4E26714-B9EF-4FF3-8AF1-11313BD679EB@freenet.de>
Message-ID: <20220219200936.104cc1ed@JRWUBU2>

On Sat, 19 Feb 2022 19:18:02 +0100
Andreas Prilop via Unicode <unicode at corp.unicode.org> wrote:

> On 19 February 2022 18:38:11 CET, I wrote:
> 
> > Write ZWNJ between shadda and kasra.
> > 
> >  <h1>&#x644;&#x651;&zwnj;&#x650;</h1>  

To achieve which rendering?  For HarfBuzz with the Amiri font, feature
ss05 still selects its form.  On the other hand, for HarfBuzz with
Firefox's default font, it selects kasra below lam, whereas without ZWNJ
one gets kasra below shadda.  Now, for HarfBuzz with the Amiri font,
<U+0644, U+0650 KASRA, U+200C ZWNJ, U+0651 SHADDA> consistently gets
kasra below lam, which is the font's default.

I suspect each font goes its own way.


> It is strange that ?&zwnj;? disappeared on
> 
> https://corp.unicode.org/pipermail/unicode/2022-February/009965.html

It's still there, but converted from character entity to entity.


From wjgo_10009 at btinternet.com  Mon Feb 21 06:43:04 2022
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Mon, 21 Feb 2022 12:43:04 +0000 (GMT)
Subject: International Mother Language Day 2022
Message-ID: <629a354c.689b.17f1c4e8cf8.Webtop.96@btinternet.com>

https://en.unesco.org/commemorations/motherlanguageday

William Overington

Monday 21 February 2022


From wjgo_10009 at btinternet.com  Mon Feb 21 06:58:41 2022
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Mon, 21 Feb 2022 12:58:41 +0000 (GMT)
Subject: Art produced using glyphs that were generated using the Alphabet
 Synthesis Machine
Message-ID: <112ba10b.694e.17f1c5cd86f.Webtop.96@btinternet.com>

Almost twenty years ago, in 2002, there was a post in this mailing list 
about the Alphabet Synthesis Machine.

https://www.unicode.org/mail-arch/unicode-ml/y2002-m02/0541.html

Here is a link to a 2022 thread with art using glyphs from fonts that 
were produced at that time.

https://forum.affinity.serif.com/index.php?/topic/157614-lady-reading-haiku-to-an-elephant/

William Overington

Monday 21 February 2022


From mark at kli.org  Tue Feb 22 08:00:29 2022
From: mark at kli.org (Mark E. Shoulson)
Date: Tue, 22 Feb 2022 09:00:29 -0500
Subject: Kirai Rat Decompositions, was Re: Compatibility decomposables
 that are not compatibility characters
In-Reply-To: <20220218220640.3dd190dc@JRWUBU2>
References: <CAGVe82TPwM-DeO+sKw-jhpVxUxjC3QC+eGNX+SE2B+BUixcH1g@mail.gmail.com>
 <e90e5bb2-967e-b843-8919-5a547fa53944@cateee.net>
 <703ad57b-3d7f-4b56-8221-a1c8876ad061@sonic.net>
 <CAM+ijLgccfSaQx6mnr27J5WygrJN3RH8dumHjKZaYo2M5NdQdA@mail.gmail.com>
 <34c1a873-25c2-8c66-6f59-278d95341eb1@shoulson.com>
 <20220218220640.3dd190dc@JRWUBU2>
Message-ID: <ad837fea-5a76-4cbd-0e8a-820b67035245@shoulson.com>


On 2/18/22 17:06, Richard Wordingham via Unicode wrote:
> On Fri, 18 Feb 2022 14:46:13 -0500
> "Mark E. Shoulson via Unicode" <unicode at corp.unicode.org> wrote:
>
>> Perhaps relevant to this thread, I was just reading in
>> https://www.unicode.org/L2/L2022/22043-kirat-rai.pdf L2/22-043,
>> proposal to encode Kirai Rat Script, where it remarks regarding the
>> vowels:
>>
>>> These should all be encoded atomically. This is because
>>> linguistically these vowels are not composed of two
>>> separatecharacters, they are single vowels in their own right. It
>>> is true that the custom encoded Kirat Rai font uses decomposedvowel
>>> signs as a matter of expediency, but this decision should not
>>> influence the right way to encode the script.Because the glyph for
>>> some of the vowels (aa and e) are part of the shape of the last 3
>>> vowels (ai, o, au) there shouldbe canonical decompositions for the
>>> last 3 vowels. With these decompositions, Do Not Use tables are not
>>> necessary.
>> If the vowels are to be encoded atomically, and it sounds like they
>> should be, shouldn't we *not* want to have canonical decompositions
>> for them?? I thought Unicode was trying to avoid precomposed
>> characters at this point.? I guess it's too late to hope for "only
>> one right way to spell it" out of Unicode, but is that still
>> something we try to approach?? It almost seems to me that canonical
>> decompositions also stem from cases of "things that wouldn't be
>> encoded if they were proposed now," and if so it would not really
>> make sense to propose anything with a canonical decomposition.? Or am
>> I misunderstanding the attitude towards canonical decompositions, or
>> the proposal's statement?
> X technology should obviously be opposed wherever possible.  We should
> make it impossible to enter these vowel symbols at a a single stroke
> when using a simple X keyboard or even an MSKLC keyboard creator.  We
> must keep professional keyboard writers in work.
>
> Your wording is confusing.  There are several different options:
>
> 1) Only allow encoding for single vowels (the Khmer model)
> 2) Do not encode visually compound vowels (the Myanmar model)
> 3) Allow visually compound vowels as sequences or as single characters
> (the south Indian model)
>
> The proposal argues for (3), which rather assumes that canonical
> equivalence will be taken seriously.  At least we don't have the
> problem presented by doubled multipart south Indian vowels.
>
> Model (1) calls forth a need for stop lists, and potential confusion
> when a compound vowel notation is later found to be needed.  (From
> the Southern Thai point of view, there seems to be a vowel missing from
> the Khmer script which it would be very tempting to just encode as
> <U+17C1, U+17B7>, though in *Khmer* usage it is arguably just a glyph
> variant of U+17BE KHMER VOWEL SIGN OE.)
>
> I think you're calling for (2), which with current technology seems to
> make keyboard creation unduly complicated or fragile if we want users
> to be able to treat KIRAT RAI VOWEL SIGN O as a single entity.  (Do
> users have such a perception?  We'll probably be told that it's not a
> user-perceived character.)

Sorry to have been confusing, and I'm not so much "calling for" one 
answer or another as asking what's more in line with what we do.? The 
text in the proposal says "These should all be encoded atomically. This 
is because linguistically these vowels are not composed of two separate 
characters, they are single vowels in their own right."? This would seem 
to me to be proposing that the seemingly-compound characters be encoded 
instead as single characters, because they are not viewed as being 
compound.? And that makes sense to me, as well, albeit we also go in the 
other direction, in not encoding compound letters like "ll" or "ch" in 
Welsh as separate letters.

But then the proposal goes on to say "Because the glyph for some of the 
vowels (aa and e) are part of the shape of the last 3 vowels (ai, o, au) 
there should be canonical decompositions for the last 3 vowels," which 
sounds to me like the atomic single "ai" vowel is to be given a 
canonical decomposition into its simpler components, i.e., "ai" is 
basically a precomposed character, like ?, which has atomic existence 
but is canonically equivalent to e + ??.? As I understand it, that would 
be #3 in your list above.? And I thought that was considered a Bad Thing 
these days, that we were trying to avoid, when possible, having too many 
ways to represent the "same" (canonically equivalent) text.? Am I wrong 
about that, in general?

I guess if I were to be "calling for" anything, it would be... um, now 
I'm finding your wording unclear.? I think #1 in your list, by which I 
intend that aa and e and ai and o and au and everything would each be 
given its own code-point, and that none of those code-points would be 
canonically equivalent to a sequence of the others.? #2 sounds like 
encoding only the vowel-signs which don't look like sequences of others, 
and ai and o and au could only be represented as sequences, which seems 
to run counter to the proposal (not that decisions can't be made counter 
to proposals), and #3 sounds like encoding each vowel as its own 
character, as in #1, *and* the "compound" variables could be represented 
either by their own codepoints or by sequences of "simple" vowels, and 
the two representations would be canonically equivalent, and that 
situation, to me, seems undesirable.

Am I making sense?

~mark

From richard.wordingham at ntlworld.com  Tue Feb 22 16:05:04 2022
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Tue, 22 Feb 2022 22:05:04 +0000
Subject: Kirai Rat Decompositions, was Re: Compatibility decomposables
 that are not compatibility characters
In-Reply-To: <ad837fea-5a76-4cbd-0e8a-820b67035245@shoulson.com>
References: <CAGVe82TPwM-DeO+sKw-jhpVxUxjC3QC+eGNX+SE2B+BUixcH1g@mail.gmail.com>
 <e90e5bb2-967e-b843-8919-5a547fa53944@cateee.net>
 <703ad57b-3d7f-4b56-8221-a1c8876ad061@sonic.net>
 <CAM+ijLgccfSaQx6mnr27J5WygrJN3RH8dumHjKZaYo2M5NdQdA@mail.gmail.com>
 <34c1a873-25c2-8c66-6f59-278d95341eb1@shoulson.com>
 <20220218220640.3dd190dc@JRWUBU2>
 <ad837fea-5a76-4cbd-0e8a-820b67035245@shoulson.com>
Message-ID: <20220222220504.704d46d6@JRWUBU2>

On Tue, 22 Feb 2022 09:00:29 -0500
"Mark E. Shoulson via Unicode" <unicode at corp.unicode.org> wrote:

> But then the proposal goes on to say "Because the glyph for some of
> the vowels (aa and e) are part of the shape of the last 3 vowels (ai,
> o, au) there should be canonical decompositions for the last 3
> vowels," which sounds to me like the atomic single "ai" vowel is to
> be given a canonical decomposition into its simpler components, i.e.,
> "ai" is basically a precomposed character, like ?, which has atomic
> existence but is canonically equivalent to e + ??.? As I understand
> it, that would be #3 in your list above.? And I thought that was
> considered a Bad Thing these days, that we were trying to avoid, when
> possible, having too many ways to represent the "same" (canonically
> equivalent) text.? Am I wrong about that, in general?

What we want to avoid is canonically *inequivalent* ways of encoding the
same thing.  We are still encoding decomposable characters for Indic
vowels.

#3 doesn't introduce any new problems, and certainly none that don't
affect most Western European languages.  #3 is what is actually
proposed, though it's not obvious from the descriptive text.  The
visually compound vowels are given canonical equivalents in the code
chart.  The only problem is that canonical equivalence continues to be
badly supported.

> I guess if I were to be "calling for" anything, it would be... um,
> now I'm finding your wording unclear.? I think #1 in your list, by
> which I intend that aa and e and ai and o and au and everything would
> each be given its own code-point, and that none of those code-points
> would be canonically equivalent to a sequence of the others.

The problem with that people would still try to type the
obvious decompositions, and they would work for at least a while.
Indeed, for this script, the (dependent) vowels could be categorised as
Lo.

>?#2
> sounds like encoding only the vowel-signs which don't look like
> sequences of others, and ai and o and au could only be represented as
> sequences, which seems to run counter to the proposal (not that
> decisions can't be made counter to proposals), and #3 sounds like
> encoding each vowel as its own character, as in #1, *and* the
> "compound" variables could be represented either by their own
> codepoints or by sequences of "simple" vowels, and the two
> representations would be canonically equivalent, and that situation,
> to me, seems undesirable.

> Am I making sense?

Yes.

Richard.


From mark at kli.org  Tue Feb 22 19:49:50 2022
From: mark at kli.org (Mark E. Shoulson)
Date: Tue, 22 Feb 2022 20:49:50 -0500
Subject: Kirai Rat Decompositions, was Re: Compatibility decomposables
 that are not compatibility characters
In-Reply-To: <20220222220504.704d46d6@JRWUBU2>
References: <CAGVe82TPwM-DeO+sKw-jhpVxUxjC3QC+eGNX+SE2B+BUixcH1g@mail.gmail.com>
 <e90e5bb2-967e-b843-8919-5a547fa53944@cateee.net>
 <703ad57b-3d7f-4b56-8221-a1c8876ad061@sonic.net>
 <CAM+ijLgccfSaQx6mnr27J5WygrJN3RH8dumHjKZaYo2M5NdQdA@mail.gmail.com>
 <34c1a873-25c2-8c66-6f59-278d95341eb1@shoulson.com>
 <20220218220640.3dd190dc@JRWUBU2>
 <ad837fea-5a76-4cbd-0e8a-820b67035245@shoulson.com>
 <20220222220504.704d46d6@JRWUBU2>
Message-ID: <46f6c9ec-d3d3-ceae-80b7-2b18ad97b725@shoulson.com>


On 2/22/22 17:05, Richard Wordingham via Unicode wrote:
> On Tue, 22 Feb 2022 09:00:29 -0500
> "Mark E. Shoulson via Unicode" <unicode at corp.unicode.org> wrote:
>
>> But then the proposal goes on to say "Because the glyph for some of
>> the vowels (aa and e) are part of the shape of the last 3 vowels (ai,
>> o, au) there should be canonical decompositions for the last 3
>> vowels," which sounds to me like the atomic single "ai" vowel is to
>> be given a canonical decomposition into its simpler components, i.e.,
>> "ai" is basically a precomposed character, like ?, which has atomic
>> existence but is canonically equivalent to e + ??.? As I understand
>> it, that would be #3 in your list above.? And I thought that was
>> considered a Bad Thing these days, that we were trying to avoid, when
>> possible, having too many ways to represent the "same" (canonically
>> equivalent) text.? Am I wrong about that, in general?
> What we want to avoid is canonically *inequivalent* ways of encoding the
> same thing.  We are still encoding decomposable characters for Indic
> vowels.
>
> #3 doesn't introduce any new problems, and certainly none that don't
> affect most Western European languages.  #3 is what is actually
> proposed, though it's not obvious from the descriptive text.  The
> visually compound vowels are given canonical equivalents in the code
> chart.  The only problem is that canonical equivalence continues to be
> badly supported.

OK.? I had been thinking that multiple canonically equivalent ways to 
encode it would just mean more hassles for NFC/NFD processing, and that 
it would be better to have just the atomic ones.? But as you point out:

> The problem with that people would still try to type the
> obvious decompositions, and they would work for at least a while.

People _might_ view the characters as atomic, but then they _might_ not, 
and you aren't going to stop them by saying not to. OK.? I see now why 
encoding the atomic characters _and_ canonical equivalents makes sense.? 
Thank you.

>> Am I making sense?
> Yes.
Thanks.? I need to be reassured of that from time to time!
> Richard.
~mark

From sai at fiatfiendum.org  Sat Feb 26 07:32:27 2022
From: sai at fiatfiendum.org (Sai)
Date: Sat, 26 Feb 2022 13:32:27 +0000
Subject: =?UTF-8?Q?E=2Dinside=2Do_=2F_o=2Denclosing=2De_variant_of_German_=C3=B6?=
Message-ID: <CAHs-R5xNgCVHFYH2Bx=swgxaxcok-UedMJr2ePjCfdE0Afo45Q@mail.gmail.com>

Hello all.

Does Unicode have an existing way to encode the e-inside-o /
o-enclosing-e* variant o-e ligature for German ??

See e.g.:
* the ? in V?geln on the cover of 1st edition of Konrad Lorenz's _Er
redete mit dem Vieh, den V?geln und den Fischen_
https://en.wikipedia.org/wiki/File:ErRedeteMitDemViehDenV%C3%B6gelnUndDenFischen.jpg
- n.b. other editions have normal ?; I do not know if it's used inside
the book in normal or heading texts, or just on the cover
* the ? in K?ln (English: Cologne) in the inscription of its
cathedral's crypt
https://commons.wikimedia.org/wiki/File:O_containing_E_ligature.jpg

I do not know whether it is used in any language other than German,
nor how widely used it is for German.

There's a CC by-sa SVG of the capital version here:
https://commons.wikimedia.org/wiki/File:Latin_capital_letter_O_containing_E.svg
? but I don't know of a lower-case version.

There exist Unicode:
* ? U+24BA and ? U+24D4 ? circled latin capital/small letter e, in the
Enclosed Alphanumerics block
* ? U+0152 and ?  U+0153 ? Latin capital/small ligature oe, in the
Latin Extended-A block
* ? U+0276 ? Latin letter small capital oe, in the IPA Extensions block

However, ?/? use a circle (not letter o), and don't decompose to ? or
?; and I have not found something that does decompose to ? which would
use the enclosed ligature.

I don't know combining characters well enough to tell if there is a
combining version of either o or e which would allow this.

So? is this already a thing? Has it been proposed before? Ought it be
added to Unicode?

Sincerely,
Sai
President, Fiat Fiendum, Inc., a 501(c)(3)

* phrasing it both ways just so this discussion is easier to find by search


From dpk at nonceword.org  Sat Feb 26 10:21:47 2022
From: dpk at nonceword.org (Daphne Preston-Kendal)
Date: Sat, 26 Feb 2022 17:21:47 +0100
Subject: E-inside-o / o-enclosing-e variant of German =?utf-8?q?=C3=B6?=
In-Reply-To: <CAHs-R5xNgCVHFYH2Bx=swgxaxcok-UedMJr2ePjCfdE0Afo45Q@mail.gmail.com>
References: <CAHs-R5xNgCVHFYH2Bx=swgxaxcok-UedMJr2ePjCfdE0Afo45Q@mail.gmail.com>
Message-ID: <A9E085C2-CBB1-4D74-9BD1-CC63F437956C@nonceword.org>

On 26 Feb 2022, at 14:32, Sai via Unicode wrote:

> Hello all.
>
> Does Unicode have an existing way to encode the e-inside-o /
> o-enclosing-e* variant o-e ligature for German ??


It could reasonably be considered a typographical variant of ? or of the
combination o / O + U+0364 COMBINING LATIN SMALL LETTER E.


-- 
dpk (Daphne Preston-Kendal) ?? 12107 Berlin, Germany ?? http://dpk.io/
?What?s the good of Mercator?s North Poles and Equators,
   Tropics, Zones, and Meridian Lines??
 So the Bellman would cry: and the crew would reply
  ?They are merely conventional signs!? ? Carroll, Hunting of the Snark

From sosipiuk at gmail.com  Sat Feb 26 12:00:42 2022
From: sosipiuk at gmail.com (=?UTF-8?Q?S=C5=82awomir_Osipiuk?=)
Date: Sat, 26 Feb 2022 13:00:42 -0500
Subject: =?UTF-8?Q?Re=3A_E=2Dinside=2Do_=2F_o=2Denclosing=2De_variant_of_German_=C3=B6?=
In-Reply-To: <CAHs-R5xNgCVHFYH2Bx=swgxaxcok-UedMJr2ePjCfdE0Afo45Q@mail.gmail.com>
References: <CAHs-R5xNgCVHFYH2Bx=swgxaxcok-UedMJr2ePjCfdE0Afo45Q@mail.gmail.com>
Message-ID: <CAM+ijLhnXkLd7cXPjAQEdzf8nkxGFYaAkBcMOj4V2LKNy4xMFQ@mail.gmail.com>

This character doesn't currently exist, nor is there any apparent way
to compose it, except in an ugly form using an enclosing circle.

There is a combining letter e, but it gets placed above the previous
character: o?

"Unicode has done something similar before" seems to be a less-than
ironclad argument; precedent is not a strong factor from what I've
seen. That said, I cannot imagine how U+A66E MULTIOCULAR O (which had
only one example) can be justified for inclusion while this e-inside-o
isn't.

The proposal which brought us ?: http://unicode.org/wg2/docs/n3194.pdf

S?awomir Osipiuk

On Sat, Feb 26, 2022 at 11:05 AM Sai via Unicode
<unicode at corp.unicode.org> wrote:
>
> Hello all.
>
> Does Unicode have an existing way to encode the e-inside-o /
> o-enclosing-e* variant o-e ligature for German ??
>
> See e.g.:
> * the ? in V?geln on the cover of 1st edition of Konrad Lorenz's _Er
> redete mit dem Vieh, den V?geln und den Fischen_
> https://en.wikipedia.org/wiki/File:ErRedeteMitDemViehDenV%C3%B6gelnUndDenFischen.jpg
> - n.b. other editions have normal ?; I do not know if it's used inside
> the book in normal or heading texts, or just on the cover
> * the ? in K?ln (English: Cologne) in the inscription of its
> cathedral's crypt
> https://commons.wikimedia.org/wiki/File:O_containing_E_ligature.jpg
>
> I do not know whether it is used in any language other than German,
> nor how widely used it is for German.
>
> There's a CC by-sa SVG of the capital version here:
> https://commons.wikimedia.org/wiki/File:Latin_capital_letter_O_containing_E.svg
> ? but I don't know of a lower-case version.
>
> There exist Unicode:
> * ? U+24BA and ? U+24D4 ? circled latin capital/small letter e, in the
> Enclosed Alphanumerics block
> * ? U+0152 and ?  U+0153 ? Latin capital/small ligature oe, in the
> Latin Extended-A block
> * ? U+0276 ? Latin letter small capital oe, in the IPA Extensions block
>
> However, ?/? use a circle (not letter o), and don't decompose to ? or
> ?; and I have not found something that does decompose to ? which would
> use the enclosed ligature.
>
> I don't know combining characters well enough to tell if there is a
> combining version of either o or e which would allow this.
>
> So? is this already a thing? Has it been proposed before? Ought it be
> added to Unicode?
>
> Sincerely,
> Sai
> President, Fiat Fiendum, Inc., a 501(c)(3)
>
> * phrasing it both ways just so this discussion is easier to find by search
>


From wjgo_10009 at btinternet.com  Sat Feb 26 10:45:08 2022
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Sat, 26 Feb 2022 16:45:08 +0000 (GMT)
Subject: =?UTF-8?Q?Re:_E-inside-o_/_o-enclosing-e_variant_of_German_=C3=B6?=
In-Reply-To: <CAHs-R5xNgCVHFYH2Bx=swgxaxcok-UedMJr2ePjCfdE0Afo45Q@mail.gmail.com>
References: <CAHs-R5xNgCVHFYH2Bx=swgxaxcok-UedMJr2ePjCfdE0Afo45Q@mail.gmail.com>
Message-ID: <1ab0925c.411a.17f36ebf677.Webtop.102@btinternet.com>


Sai wrote:

> Does Unicode have an existing way to encode the e-inside-o / 
> o-enclosing-e* variant o-e ligature for German ??

I do not know if it exists at present, but I think that it possibly 
could be formally encoded using ? followed by a Variation Selector 
character.

If this becomes formally encoded, perhaps at the same time the version 
where the e is above the o could be encoded too?

https://en.wikipedia.org/wiki/%C3%96#Typography

William Overington

Saturday 26 February 2022

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220226/17742153/attachment.htm>

From duerst at it.aoyama.ac.jp  Sat Feb 26 19:45:33 2022
From: duerst at it.aoyama.ac.jp (=?UTF-8?Q?Martin_J=2e_D=c3=bcrst?=)
Date: Sun, 27 Feb 2022 10:45:33 +0900
Subject: =?UTF-8?Q?Re=3a_E-inside-o_/_o-enclosing-e_variant_of_German_=c3=b6?=
In-Reply-To: <CAHs-R5xNgCVHFYH2Bx=swgxaxcok-UedMJr2ePjCfdE0Afo45Q@mail.gmail.com>
References: <CAHs-R5xNgCVHFYH2Bx=swgxaxcok-UedMJr2ePjCfdE0Afo45Q@mail.gmail.com>
Message-ID: <bccd889e-a714-1d39-0c67-1a37aa74d399@it.aoyama.ac.jp>

I'd personally say this is just a font variant of ?. It's the book 
designer's/inscribers choice. It may look way different for outsiders, 
but people used to German will immediately understand what it is.

Regards,   Martin.


On 2022-02-26 22:32, Sai via Unicode wrote:
> Hello all.
> 
> Does Unicode have an existing way to encode the e-inside-o /
> o-enclosing-e* variant o-e ligature for German ??
> 
> See e.g.:
> * the ? in V?geln on the cover of 1st edition of Konrad Lorenz's _Er
> redete mit dem Vieh, den V?geln und den Fischen_
> https://en.wikipedia.org/wiki/File:ErRedeteMitDemViehDenV%C3%B6gelnUndDenFischen.jpg
> - n.b. other editions have normal ?; I do not know if it's used inside
> the book in normal or heading texts, or just on the cover
> * the ? in K?ln (English: Cologne) in the inscription of its
> cathedral's crypt
> https://commons.wikimedia.org/wiki/File:O_containing_E_ligature.jpg
> 
> I do not know whether it is used in any language other than German,
> nor how widely used it is for German.
> 
> There's a CC by-sa SVG of the capital version here:
> https://commons.wikimedia.org/wiki/File:Latin_capital_letter_O_containing_E.svg
> ? but I don't know of a lower-case version.
> 
> There exist Unicode:
> * ? U+24BA and ? U+24D4 ? circled latin capital/small letter e, in the
> Enclosed Alphanumerics block
> * ? U+0152 and ?  U+0153 ? Latin capital/small ligature oe, in the
> Latin Extended-A block
> * ? U+0276 ? Latin letter small capital oe, in the IPA Extensions block
> 
> However, ?/? use a circle (not letter o), and don't decompose to ? or
> ?; and I have not found something that does decompose to ? which would
> use the enclosed ligature.
> 
> I don't know combining characters well enough to tell if there is a
> combining version of either o or e which would allow this.
> 
> So? is this already a thing? Has it been proposed before? Ought it be
> added to Unicode?
> 
> Sincerely,
> Sai
> President, Fiat Fiendum, Inc., a 501(c)(3)
> 
> * phrasing it both ways just so this discussion is easier to find by search
> 


From wjgo_10009 at btinternet.com  Sat Feb 26 16:22:46 2022
From: wjgo_10009 at btinternet.com (William_J_G Overington)
Date: Sat, 26 Feb 2022 22:22:46 +0000 (GMT)
Subject: =?UTF-8?Q?Re:_E-inside-o_/_o-enclosing-e_variant_of_German_=C3=B6?=
In-Reply-To: <1ab0925c.411a.17f36ebf677.Webtop.102@btinternet.com>
References: <CAHs-R5xNgCVHFYH2Bx=swgxaxcok-UedMJr2ePjCfdE0Afo45Q@mail.gmail.com>
 <1ab0925c.411a.17f36ebf677.Webtop.102@btinternet.com>
Message-ID: <2fffbb82.452a.17f38211796.Webtop.102@btinternet.com>


I write to make a correction please.

Earlier I wrote as follows:

> If this becomes formally encoded, perhaps at the same time the version 
> where the e is above the o could be encoded too?

However, since then I have read the following.

S?awomir Osipiuk wrote:

> There is a combining letter e, but it gets placed above the previous> 
> character: o?

So the character o? is already encoded.

I have found the combining letter e at U+0364.

U+0364 COMBINING LATIN SMALL LETTER E

https://www.unicode.org/charts/PDF/U0300.pdf 
<https://www.unicode.org/charts/PDF/U0300.pdf>


William Overington

Saturday 26 February 2022


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220226/59c5594b/attachment.htm>

From jukkakk at gmail.com  Sun Feb 27 06:00:51 2022
From: jukkakk at gmail.com (Jukka K. Korpela)
Date: Sun, 27 Feb 2022 14:00:51 +0200
Subject: =?UTF-8?Q?Re=3A_E=2Dinside=2Do_=2F_o=2Denclosing=2De_variant_of_German_=C3=B6?=
In-Reply-To: <bccd889e-a714-1d39-0c67-1a37aa74d399@it.aoyama.ac.jp>
References: <CAHs-R5xNgCVHFYH2Bx=swgxaxcok-UedMJr2ePjCfdE0Afo45Q@mail.gmail.com>
 <bccd889e-a714-1d39-0c67-1a37aa74d399@it.aoyama.ac.jp>
Message-ID: <CAGHxYa7gFDKvRV_-7Zg6ads1zcApDiRCqs1gg=W2rk3T7qCfiA@mail.gmail.com>

Martin J. D?rst via Unicode (unicode at corp.unicode.org) wrote:

I'd personally say this is just a font variant of ?. It's the book
> designer's/inscribers choice. It may look way different for outsiders,
> but people used to German will immediately understand what it is.


With my limited (two years at school) understanding of German, I fully
agree.

The letter ? originates from an o with an e above it, and in German it is
customary to replace ? by oe (a two-character combination, not the ligature
?) when needed, e.g. when the character repertoire is limited to that of
Ascii. Since KOELN would be understood as K?LN, so would KOLN with an E
inside the O ? a surprise perhaps if you never saw it before, but not a new
character.

Things might be different if there were texts where both a normal ? and an
o with an e inside both appear within the same font. Even then, I would say
it is a font variant of ?. A font may well contain variant glyphs for a
character. In order to justify encoding an o with an e inside, I think you
would need present evidence of texts showing 1) usage where it causes a
difference in meaning with respect to ?, or 2) usage that is independent of
the use of the letter ? in different human languages, such as use in some
special phonetic or technical meaning.

Yucca, https://jkorpela.fi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220227/af570273/attachment.htm>

From harjitmoe at outlook.com  Sun Feb 27 06:44:21 2022
From: harjitmoe at outlook.com (Harriet Riddle)
Date: Sun, 27 Feb 2022 12:44:21 +0000
Subject: =?UTF-8?Q?Re:_E-inside-o_/_o-enclosing-e_variant_of_German_=c3=b6?=
In-Reply-To: <CAGHxYa7gFDKvRV_-7Zg6ads1zcApDiRCqs1gg=W2rk3T7qCfiA@mail.gmail.com>
References: <CAHs-R5xNgCVHFYH2Bx=swgxaxcok-UedMJr2ePjCfdE0Afo45Q@mail.gmail.com>
 <bccd889e-a714-1d39-0c67-1a37aa74d399@it.aoyama.ac.jp>
 <CAGHxYa7gFDKvRV_-7Zg6ads1zcApDiRCqs1gg=W2rk3T7qCfiA@mail.gmail.com>
Message-ID: <VI1PR07MB57124999387BC423C27A2519B7009@VI1PR07MB5712.eurprd07.prod.outlook.com>


> A font may well contain variant glyphs for a character. In order to 
> justify encoding an o with an e inside, I think you would need present 
> evidence of texts showing 1) usage where it causes a difference in 
> meaning with respect to ?, or 2) usage that is independent of the use 
> of the letter ? in different human languages, such as use in some 
> special phonetic or technical meaning.


One thing I don't think I've seen mentioned yet is that ? is already a 
unification of O-diaeresis and O-umlaut, and while the glyph variant 
under discussion is a valid variant of O-umlaut (related to o?, ? and ? 
as other variants, where form acceptability and form preference varies 
between languages that use O-umlaut), it is not a valid variant of 
O-diaeresis.

?Har.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220227/0eec73c9/attachment.htm>

From alexander.lange at catrinity-font.de  Sun Feb 27 03:13:24 2022
From: alexander.lange at catrinity-font.de (Alexander Lange)
Date: Sun, 27 Feb 2022 10:13:24 +0100
Subject: =?UTF-8?Q?Re=3a_E-inside-o_/_o-enclosing-e_variant_of_German_=c3=b6?=
In-Reply-To: <bccd889e-a714-1d39-0c67-1a37aa74d399@it.aoyama.ac.jp>
References: <CAHs-R5xNgCVHFYH2Bx=swgxaxcok-UedMJr2ePjCfdE0Afo45Q@mail.gmail.com>
 <bccd889e-a714-1d39-0c67-1a37aa74d399@it.aoyama.ac.jp>
Message-ID: <0e3dd872-d388-534f-f87a-8a1a8da1f186@catrinity-font.de>

Hi,

another German here. I also think it is just a glyph variant of ? - or 
rather ?. I have only ever seen this in all-uppercase inscriptions where 
the line height is hardly bigger than the capital height. In both images 
Sai has linked to you can see that all lines would need to be higher 
just for the one umlaut in one of the lines if the standard glyph were 
used.

There are several strategies to achieve this:

 ?* Use a smaller variant of the base letter. This is commonly done on
 ?? keyboards, see e.g. here:
https://angelikasgerman.co.uk/what-does-a-german-keyboard-look-like/
 ?? and on license plates:
https://en.wikipedia.org/wiki/FE-Schrift#/media/File:FE-Schrift.svg
 ?* Put dots or e inside O or U (doesn't work well with A)
 ?* Put one dot at each side of A or O (doesn't work well with U)
 ?* Use AE, OE or UE.

In normal text and especially on small letters, none of this is needed 
as you have enough space on top of the letters anyway.

Kind regards,
Alexander

On 27.02.2022 02:45, Martin J. D?rst via Unicode wrote:
> I'd personally say this is just a font variant of ?. It's the book 
> designer's/inscribers choice. It may look way different for outsiders, 
> but people used to German will immediately understand what it is.
>
> Regards,?? Martin.
>
>
> On 2022-02-26 22:32, Sai via Unicode wrote:
>> Hello all.
>>
>> Does Unicode have an existing way to encode the e-inside-o /
>> o-enclosing-e* variant o-e ligature for German ??
>>
>> See e.g.:
>> * the ? in V?geln on the cover of 1st edition of Konrad Lorenz's _Er
>> redete mit dem Vieh, den V?geln und den Fischen_
>> https://en.wikipedia.org/wiki/File:ErRedeteMitDemViehDenV%C3%B6gelnUndDenFischen.jpg 
>>
>> - n.b. other editions have normal ?; I do not know if it's used inside
>> the book in normal or heading texts, or just on the cover
>> * the ? in K?ln (English: Cologne) in the inscription of its
>> cathedral's crypt
>> https://commons.wikimedia.org/wiki/File:O_containing_E_ligature.jpg
>>
>> I do not know whether it is used in any language other than German,
>> nor how widely used it is for German.
>>
>> There's a CC by-sa SVG of the capital version here:
>> https://commons.wikimedia.org/wiki/File:Latin_capital_letter_O_containing_E.svg 
>>
>> ? but I don't know of a lower-case version.
>>
>> There exist Unicode:
>> * ? U+24BA and ? U+24D4 ? circled latin capital/small letter e, in the
>> Enclosed Alphanumerics block
>> * ? U+0152 and ?? U+0153 ? Latin capital/small ligature oe, in the
>> Latin Extended-A block
>> * ? U+0276 ? Latin letter small capital oe, in the IPA Extensions block
>>
>> However, ?/? use a circle (not letter o), and don't decompose to ? or
>> ?; and I have not found something that does decompose to ? which would
>> use the enclosed ligature.
>>
>> I don't know combining characters well enough to tell if there is a
>> combining version of either o or e which would allow this.
>>
>> So? is this already a thing? Has it been proposed before? Ought it be
>> added to Unicode?
>>
>> Sincerely,
>> Sai
>> President, Fiat Fiendum, Inc., a 501(c)(3)
>>
>> * phrasing it both ways just so this discussion is easier to find by 
>> search
>>
>

From kent.b.karlsson at bahnhof.se  Sun Feb 27 12:34:39 2022
From: kent.b.karlsson at bahnhof.se (Kent Karlsson)
Date: Sun, 27 Feb 2022 19:34:39 +0100
Subject: =?utf-8?Q?Re=3A_E-inside-o_/_o-enclosing-e_variant_of_German_?=
 =?utf-8?Q?=C3=B6?=
In-Reply-To: <A9E085C2-CBB1-4D74-9BD1-CC63F437956C@nonceword.org>
References: <CAHs-R5xNgCVHFYH2Bx=swgxaxcok-UedMJr2ePjCfdE0Afo45Q@mail.gmail.com>
 <A9E085C2-CBB1-4D74-9BD1-CC63F437956C@nonceword.org>
Message-ID: <D1A03BDC-8D5C-49A3-A815-762840DDD1E3@bahnhof.se>


> 26 feb. 2022 kl. 17:21 skrev Daphne Preston-Kendal via Unicode <unicode at corp.unicode.org>:
> 
> On 26 Feb 2022, at 14:32, Sai via Unicode wrote:
> 
>> Hello all.
>> 
>> Does Unicode have an existing way to encode the e-inside-o /
>> o-enclosing-e* variant o-e ligature for German ??
> 
> 
> It could reasonably be considered a typographical variant of ? or of the
> combination o / O + U+0364 COMBINING LATIN SMALL LETTER E.

It is most definitely NOT a glyph variant of ?. With quite a bit of stretch it may be considered a glyph variant of o? (small or capital).

After all, having the double dots inside of ? (and similar) is considered a glyph variant of ? (see Alexander Lang?s message in this thread) and even capitals or small capitals are sometimes considered variants of small letters, Opentype fonts can even have ?feature tags? for that, and similarly for CSS: font-variant-caps: small-caps;and text-transform: uppercase;. (The latter is called a ?transform?, but CSS is about styling, so it is actually a styling not a transform; the stored text is not changed.)

/Kent K

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220227/dc8cca0b/attachment-0001.htm>

From asmusf at ix.netcom.com  Sun Feb 27 22:13:50 2022
From: asmusf at ix.netcom.com (Asmus Freytag)
Date: Sun, 27 Feb 2022 20:13:50 -0800
Subject: =?UTF-8?Q?Re=3a_E-inside-o_/_o-enclosing-e_variant_of_German_=c3=b6?=
In-Reply-To: <bccd889e-a714-1d39-0c67-1a37aa74d399@it.aoyama.ac.jp>
References: <CAHs-R5xNgCVHFYH2Bx=swgxaxcok-UedMJr2ePjCfdE0Afo45Q@mail.gmail.com>
 <bccd889e-a714-1d39-0c67-1a37aa74d399@it.aoyama.ac.jp>
Message-ID: <f5e0c31a-9883-855f-3a9f-df73defc481a@ix.netcom.com>

An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220227/8260ff05/attachment.htm>

From richard.wordingham at ntlworld.com  Mon Feb 28 15:09:42 2022
From: richard.wordingham at ntlworld.com (Richard Wordingham)
Date: Mon, 28 Feb 2022 21:09:42 +0000
Subject: Bidi and Empty Parentheses
Message-ID: <20220228210942.0271a9e2@JRWUBU2>

At a right-to-left embedding level, in the absence of directional
overrides, should the 4-character ASCII substring "x()y" render like
"x()y" or like "y()x"?

Richard.  

From kenwhistler at sonic.net  Mon Feb 28 19:41:03 2022
From: kenwhistler at sonic.net (Ken Whistler)
Date: Mon, 28 Feb 2022 17:41:03 -0800
Subject: Bidi and Empty Parentheses
In-Reply-To: <20220228210942.0271a9e2@JRWUBU2>
References: <20220228210942.0271a9e2@JRWUBU2>
Message-ID: <3655c0eb-b355-4762-6a9f-11687562b525@sonic.net>

Richard,

"x()y"

More specifically, with an explicit LTR paragraph direction:

Trace: Entering br_Check
Current State: 20
 ? Text:??????? 0078 0028 0029 0079
 ? Bidi_Class:???? L??? L??? L??? L
 ? Levels: *0??? 0??? 0??? 0*
 ? Exp Levels:???? 0??? 0??? 0??? 0
 ? Runs:??????? <L---------------L>

 ? Order:????? [0 1 2 3]
 ? Exp Order:? [0 1 2 3]

I.e. "x()y"

With an explicit RTL paragraph direction:

Trace: Entering br_Check
Current State: 20
 ? Text:??????? 0078 0028 0029 0079
 ? Bidi_Class:???? L??? L??? L??? L
 ? Levels: *2??? 2??? 2??? 2*
 ? Exp Levels:???? 2??? 2??? 2??? 2
 ? Runs:??????? <R---------------R>

 ? Order:????? [0 1 2 3]
 ? Exp Order:? [0 1 2 3]

I.e. "x()y". Note that the paragraph embedding level is 1, and the 
resolved levels are 2 (instead of 0), but the resolved display order of 
the string is identical in both cases.

--Ken

On 2/28/2022 1:09 PM, Richard Wordingham via Unicode wrote:
> At a right-to-left embedding level, in the absence of directional
> overrides, should the 4-character ASCII substring "x()y" render like
> "x()y" or like "y()x"?
>
> Richard.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20220228/6b2025e3/attachment.htm>

From eliz at gnu.org  Mon Feb 28 21:31:48 2022
From: eliz at gnu.org (Eli Zaretskii)
Date: Tue, 01 Mar 2022 05:31:48 +0200
Subject: Bidi and Empty Parentheses
In-Reply-To: <20220228210942.0271a9e2@JRWUBU2> (message from Richard
 Wordingham via Unicode on Mon, 28 Feb 2022 21:09:42 +0000)
References: <20220228210942.0271a9e2@JRWUBU2>
Message-ID: <83tucil55n.fsf@gnu.org>

> Date: Mon, 28 Feb 2022 21:09:42 +0000
> From: Richard Wordingham via Unicode <unicode at corp.unicode.org>
> 
> At a right-to-left embedding level, in the absence of directional
> overrides, should the 4-character ASCII substring "x()y" render like
> "x()y" or like "y()x"?

y()x, AFAIU.

From eliz at gnu.org  Mon Feb 28 21:38:20 2022
From: eliz at gnu.org (Eli Zaretskii)
Date: Tue, 01 Mar 2022 05:38:20 +0200
Subject: Bidi and Empty Parentheses
In-Reply-To: <3655c0eb-b355-4762-6a9f-11687562b525@sonic.net> (message from
 Ken Whistler via Unicode on Mon, 28 Feb 2022 17:41:03 -0800)
References: <20220228210942.0271a9e2@JRWUBU2>
 <3655c0eb-b355-4762-6a9f-11687562b525@sonic.net>
Message-ID: <83pmn6l4ur.fsf@gnu.org>

> Date: Mon, 28 Feb 2022 17:41:03 -0800
> Cc: unicode at corp.unicode.org
> From: Ken Whistler via Unicode <unicode at corp.unicode.org>
> 
> Richard,
> 
> "x()y"

Maybe there's a misunderstanding.  Richard said "in a right-to-left
embedding", so I tried

  RLE x ( ) y PDF

and got "y()x" on display.