Composition / Decomposition of Tibetan oM (0F00)

Sun Sep 30 23:22:32 CDT 2018

Hi Elie

U+0F00;TIBETAN SYLLABLE OM is a leftover of the original encoding of
Tibetan script in Unicode (later removed) which, like the encoding of most
scripts used in India, was based on ISCII. I think the argument put forward
for maintaining U+0F00 as a separate character was that it would ease the
lossless 2 way conversion between Tibetan and other Indic scripts like
Devanagari which have a unique character for OM.

The model now used for encoding the Tibetan script originates in ISO 10646
JTC1/SC2/WG2 document N998  (April 1994) "Proposal for Encoding Tibetan
Script in the BMP" - from the UK. Then there was N1159 etc. Also a *lot* of
discussion on the Tibex (Tibetan Extensions) mailing list which should be
archived somewhere on the Unicode.org server. Amongst all that discussion
you might find somebody's argument or reasoning for U+0F00 lacking a
decomposition.

The encoding model for Tibetan came about when the UK national committee
asked Phillip Denwood at SOAS for comments on the proposals for Tibetan
circulating at the time  - Phillip Denwood then passed this task over to
me. I put together a lot of comments and outlined a proposal for encoding
Tibetan with a full set of combining consonants (earlier proposals had only
combining wazur, ya-tag and ra-tag) and the suggestion that characters
should be entered in the order in which they are written and in which
Tibetan children learn to spell out loud. I also discussed this extensively
with Thubten Nyima (Alak Zenkar Rinpoche) who was then at SOAS working on
the translation of his dictionary. These comments went back to the BSI
committee and became the basis of the proposal N998.

- Chris

On Mon, 19 Mar 2018 at 15:03, Élie Roux via Indic <indic at unicode.org> wrote:

> Dear Richard,
>
> > Sacred syllable v. run of the mill syllable?
>
> Hmm, ok let's ask more direct questions, which are on two different
> aspects of the problem:
>
> 1. There are a lot of sacred syllables in Tibetan, why choose this one
> in particular? Hung (U+0F67 U+0F71 U+0F74 U+0F82) is at least as sacred
> and as widespread...
>
> 2. Why isn't U+0F00 considered a composition of U+0F68 U+0F7C U+0F7E in
> UnicodeData.txt? What I see is:
>
> 0F00;TIBETAN SYLLABLE OM;Lo;0;L;;;;;N;;;;;
>
> while I believe it should contain
>
> 0F00;TIBETAN SYLLABLE OM;Lo;0;L;0F68 0F7C 0F7E;;;;N;;;;;
>
> (same for 0F02 and 0F03).
>
> > For example, under the UCA default collation, U+0F00 and <U+0F68,
> > U+0F7C, U+0F7E> are no more different than upper and lower case in
> > English.
>
> Hmmm thanks a lot for that! This seems to be somewhat new, but indeed I
> can see
>
> 0F00  ; [.2F19.0020.0004][.2F30.0020.0004][.0000.00C4.0004] # TIBETAN
> SYLLABLE OM
>
> in http://www.unicode.org/Public/UCA/10.0.0/allkeys.txt
>
> So I guess I'm even more eager to have some clues on my question number
> 2, if the UCA acknowledges that the composed and decomposed characters
> have the same weight, why doesn't UnicodeData list them as
> composition/decomposition?
>
> Thank you,
> --
> Elie
> _______________________________________________
> Indic mailing list
> Indic at unicode.org
> http://unicode.org/mailman/listinfo/indic
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/indic/attachments/20181001/248f2cb3/attachment.html>