Kirai Rat Decompositions, was Re: Compatibility decomposables that are not compatibility characters

Richard Wordingham richard.wordingham at ntlworld.com
Fri Feb 18 16:06:40 CST 2022


On Fri, 18 Feb 2022 14:46:13 -0500
"Mark E. Shoulson via Unicode" <unicode at corp.unicode.org> wrote:

> Perhaps relevant to this thread, I was just reading in 
> https://www.unicode.org/L2/L2022/22043-kirat-rai.pdf L2/22-043,
> proposal to encode Kirai Rat Script, where it remarks regarding the
> vowels:
> 
> > These should all be encoded atomically. This is because
> > linguistically these vowels are not composed of two
> > separatecharacters, they are single vowels in their own right. It
> > is true that the custom encoded Kirat Rai font uses decomposedvowel
> > signs as a matter of expediency, but this decision should not
> > influence the right way to encode the script.Because the glyph for
> > some of the vowels (aa and e) are part of the shape of the last 3
> > vowels (ai, o, au) there shouldbe canonical decompositions for the
> > last 3 vowels. With these decompositions, Do Not Use tables are not
> > necessary.  
> If the vowels are to be encoded atomically, and it sounds like they 
> should be, shouldn't we *not* want to have canonical decompositions
> for them?  I thought Unicode was trying to avoid precomposed
> characters at this point.  I guess it's too late to hope for "only
> one right way to spell it" out of Unicode, but is that still
> something we try to approach?  It almost seems to me that canonical
> decompositions also stem from cases of "things that wouldn't be
> encoded if they were proposed now," and if so it would not really
> make sense to propose anything with a canonical decomposition.  Or am
> I misunderstanding the attitude towards canonical decompositions, or
> the proposal's statement?

X technology should obviously be opposed wherever possible.  We should
make it impossible to enter these vowel symbols at a a single stroke
when using a simple X keyboard or even an MSKLC keyboard creator.  We
must keep professional keyboard writers in work. 

Your wording is confusing.  There are several different options:

1) Only allow encoding for single vowels (the Khmer model)
2) Do not encode visually compound vowels (the Myanmar model)
3) Allow visually compound vowels as sequences or as single characters
(the south Indian model)

The proposal argues for (3), which rather assumes that canonical
equivalence will be taken seriously.  At least we don't have the
problem presented by doubled multipart south Indian vowels.

Model (1) calls forth a need for stop lists, and potential confusion
when a compound vowel notation is later found to be needed.  (From
the Southern Thai point of view, there seems to be a vowel missing from
the Khmer script which it would be very tempting to just encode as
<U+17C1, U+17B7>, though in *Khmer* usage it is arguably just a glyph
variant of U+17BE KHMER VOWEL SIGN OE.) 

I think you're calling for (2), which with current technology seems to
make keyboard creation unduly complicated or fragile if we want users
to be able to treat KIRAT RAI VOWEL SIGN O as a single entity.  (Do
users have such a perception?  We'll probably be told that it's not a
user-perceived character.)

Richard.



More information about the Unicode mailing list