Kirai Rat Decompositions, was Re: Compatibility decomposables that are not compatibility characters
Mark E. Shoulson
mark at kli.org
Tue Feb 22 19:49:50 CST 2022
On 2/22/22 17:05, Richard Wordingham via Unicode wrote:
> On Tue, 22 Feb 2022 09:00:29 -0500
> "Mark E. Shoulson via Unicode" <unicode at corp.unicode.org> wrote:
>
>> But then the proposal goes on to say "Because the glyph for some of
>> the vowels (aa and e) are part of the shape of the last 3 vowels (ai,
>> o, au) there should be canonical decompositions for the last 3
>> vowels," which sounds to me like the atomic single "ai" vowel is to
>> be given a canonical decomposition into its simpler components, i.e.,
>> "ai" is basically a precomposed character, like é, which has atomic
>> existence but is canonically equivalent to e + ◌́. As I understand
>> it, that would be #3 in your list above. And I thought that was
>> considered a Bad Thing these days, that we were trying to avoid, when
>> possible, having too many ways to represent the "same" (canonically
>> equivalent) text. Am I wrong about that, in general?
> What we want to avoid is canonically *inequivalent* ways of encoding the
> same thing. We are still encoding decomposable characters for Indic
> vowels.
>
> #3 doesn't introduce any new problems, and certainly none that don't
> affect most Western European languages. #3 is what is actually
> proposed, though it's not obvious from the descriptive text. The
> visually compound vowels are given canonical equivalents in the code
> chart. The only problem is that canonical equivalence continues to be
> badly supported.
OK. I had been thinking that multiple canonically equivalent ways to
encode it would just mean more hassles for NFC/NFD processing, and that
it would be better to have just the atomic ones. But as you point out:
> The problem with that people would still try to type the
> obvious decompositions, and they would work for at least a while.
People _might_ view the characters as atomic, but then they _might_ not,
and you aren't going to stop them by saying not to. OK. I see now why
encoding the atomic characters _and_ canonical equivalents makes sense.
Thank you.
>> Am I making sense?
> Yes.
Thanks. I need to be reassured of that from time to time!
> Richard.
~mark
More information about the Unicode
mailing list