From indic at unicode.org Sun Sep 30 23:22:32 2018 From: indic at unicode.org (Christopher Fynn via Indic) Date: Mon, 1 Oct 2018 10:07:32 +0545 Subject: Composition / Decomposition of Tibetan oM (0F00) In-Reply-To: <8dba7890-b6a6-92a1-a503-98647c63d841@telecom-bretagne.eu> References: <71c75887-ce61-2cf6-a7cc-c750defd2713@telecom-bretagne.eu> <20180318205126.383e1f01@JRWUBU2> <8dba7890-b6a6-92a1-a503-98647c63d841@telecom-bretagne.eu> Message-ID: Hi Elie U+0F00;TIBETAN SYLLABLE OM is a leftover of the original encoding of Tibetan script in Unicode (later removed) which, like the encoding of most scripts used in India, was based on ISCII. I think the argument put forward for maintaining U+0F00 as a separate character was that it would ease the lossless 2 way conversion between Tibetan and other Indic scripts like Devanagari which have a unique character for OM. The model now used for encoding the Tibetan script originates in ISO 10646 JTC1/SC2/WG2 document N998 (April 1994) "Proposal for Encoding Tibetan Script in the BMP" - from the UK. Then there was N1159 etc. Also a *lot* of discussion on the Tibex (Tibetan Extensions) mailing list which should be archived somewhere on the Unicode.org server. Amongst all that discussion you might find somebody's argument or reasoning for U+0F00 lacking a decomposition. The encoding model for Tibetan came about when the UK national committee asked Phillip Denwood at SOAS for comments on the proposals for Tibetan circulating at the time - Phillip Denwood then passed this task over to me. I put together a lot of comments and outlined a proposal for encoding Tibetan with a full set of combining consonants (earlier proposals had only combining wazur, ya-tag and ra-tag) and the suggestion that characters should be entered in the order in which they are written and in which Tibetan children learn to spell out loud. I also discussed this extensively with Thubten Nyima (Alak Zenkar Rinpoche) who was then at SOAS working on the translation of his dictionary. These comments went back to the BSI committee and became the basis of the proposal N998. - Chris On Mon, 19 Mar 2018 at 15:03, ?lie Roux via Indic wrote: > Dear Richard, > > > Sacred syllable v. run of the mill syllable? > > Hmm, ok let's ask more direct questions, which are on two different > aspects of the problem: > > 1. There are a lot of sacred syllables in Tibetan, why choose this one > in particular? Hung (U+0F67 U+0F71 U+0F74 U+0F82) is at least as sacred > and as widespread... > > 2. Why isn't U+0F00 considered a composition of U+0F68 U+0F7C U+0F7E in > UnicodeData.txt? What I see is: > > 0F00;TIBETAN SYLLABLE OM;Lo;0;L;;;;;N;;;;; > > while I believe it should contain > > 0F00;TIBETAN SYLLABLE OM;Lo;0;L;0F68 0F7C 0F7E;;;;N;;;;; > > (same for 0F02 and 0F03). > > > For example, under the UCA default collation, U+0F00 and > U+0F7C, U+0F7E> are no more different than upper and lower case in > > English. > > Hmmm thanks a lot for that! This seems to be somewhat new, but indeed I > can see > > 0F00 ; [.2F19.0020.0004][.2F30.0020.0004][.0000.00C4.0004] # TIBETAN > SYLLABLE OM > > in http://www.unicode.org/Public/UCA/10.0.0/allkeys.txt > > So I guess I'm even more eager to have some clues on my question number > 2, if the UCA acknowledges that the composed and decomposed characters > have the same weight, why doesn't UnicodeData list them as > composition/decomposition? > > Thank you, > -- > Elie > _______________________________________________ > Indic mailing list > Indic at unicode.org > http://unicode.org/mailman/listinfo/indic > -------------- next part -------------- An HTML attachment was scrubbed... URL: