Kirai Rat Decompositions, was Re: Compatibility decomposables that are not compatibility characters

Richard Wordingham richard.wordingham at
Tue Feb 22 16:05:04 CST 2022

On Tue, 22 Feb 2022 09:00:29 -0500
"Mark E. Shoulson via Unicode" <unicode at> wrote:

> But then the proposal goes on to say "Because the glyph for some of
> the vowels (aa and e) are part of the shape of the last 3 vowels (ai,
> o, au) there should be canonical decompositions for the last 3
> vowels," which sounds to me like the atomic single "ai" vowel is to
> be given a canonical decomposition into its simpler components, i.e.,
> "ai" is basically a precomposed character, like é, which has atomic
> existence but is canonically equivalent to e + ◌́.  As I understand
> it, that would be #3 in your list above.  And I thought that was
> considered a Bad Thing these days, that we were trying to avoid, when
> possible, having too many ways to represent the "same" (canonically
> equivalent) text.  Am I wrong about that, in general?

What we want to avoid is canonically *inequivalent* ways of encoding the
same thing.  We are still encoding decomposable characters for Indic

#3 doesn't introduce any new problems, and certainly none that don't
affect most Western European languages.  #3 is what is actually
proposed, though it's not obvious from the descriptive text.  The
visually compound vowels are given canonical equivalents in the code
chart.  The only problem is that canonical equivalence continues to be
badly supported.

> I guess if I were to be "calling for" anything, it would be... um,
> now I'm finding your wording unclear.  I think #1 in your list, by
> which I intend that aa and e and ai and o and au and everything would
> each be given its own code-point, and that none of those code-points
> would be canonically equivalent to a sequence of the others.

The problem with that people would still try to type the
obvious decompositions, and they would work for at least a while.
Indeed, for this script, the (dependent) vowels could be categorised as

> #2
> sounds like encoding only the vowel-signs which don't look like
> sequences of others, and ai and o and au could only be represented as
> sequences, which seems to run counter to the proposal (not that
> decisions can't be made counter to proposals), and #3 sounds like
> encoding each vowel as its own character, as in #1, *and* the
> "compound" variables could be represented either by their own
> codepoints or by sequences of "simple" vowels, and the two
> representations would be canonically equivalent, and that situation,
> to me, seems undesirable.

> Am I making sense?



More information about the Unicode mailing list