Kirai Rat Decompositions, was Re: Compatibility decomposables that are not compatibility characters

Mark E. Shoulson mark at kli.org
Tue Feb 22 19:49:50 CST 2022


On 2/22/22 17:05, Richard Wordingham via Unicode wrote:
> On Tue, 22 Feb 2022 09:00:29 -0500
> "Mark E. Shoulson via Unicode" <unicode at corp.unicode.org> wrote:
>
>> But then the proposal goes on to say "Because the glyph for some of
>> the vowels (aa and e) are part of the shape of the last 3 vowels (ai,
>> o, au) there should be canonical decompositions for the last 3
>> vowels," which sounds to me like the atomic single "ai" vowel is to
>> be given a canonical decomposition into its simpler components, i.e.,
>> "ai" is basically a precomposed character, like é, which has atomic
>> existence but is canonically equivalent to e + ◌́.  As I understand
>> it, that would be #3 in your list above.  And I thought that was
>> considered a Bad Thing these days, that we were trying to avoid, when
>> possible, having too many ways to represent the "same" (canonically
>> equivalent) text.  Am I wrong about that, in general?
> What we want to avoid is canonically *inequivalent* ways of encoding the
> same thing.  We are still encoding decomposable characters for Indic
> vowels.
>
> #3 doesn't introduce any new problems, and certainly none that don't
> affect most Western European languages.  #3 is what is actually
> proposed, though it's not obvious from the descriptive text.  The
> visually compound vowels are given canonical equivalents in the code
> chart.  The only problem is that canonical equivalence continues to be
> badly supported.

OK.  I had been thinking that multiple canonically equivalent ways to 
encode it would just mean more hassles for NFC/NFD processing, and that 
it would be better to have just the atomic ones.  But as you point out:

> The problem with that people would still try to type the
> obvious decompositions, and they would work for at least a while.

People _might_ view the characters as atomic, but then they _might_ not, 
and you aren't going to stop them by saying not to. OK.  I see now why 
encoding the atomic characters _and_ canonical equivalents makes sense.  
Thank you.

>> Am I making sense?
> Yes.
Thanks.  I need to be reassured of that from time to time!
> Richard.
~mark


More information about the Unicode mailing list