Composition / Decomposition of Tibetan oM (0F00)

Élie Roux via Indic indic at unicode.org
Sun Mar 18 05:23:16 CDT 2018


Dear All,

I am wondering why U+0F00 is not indicated as being composed of

U+0F68 U+0F7C U+0F7E

which is what a native person would think? Is there supposed to be a
semantic difference between the two (U+0F00 and this decomposition)?
When I see something in a manuscript, how can I know if I should input
U+0F00 or the decomposition?

My experience is that different input systems will produce one or the
other so when I'm working on a Tibetan corpus I have to normalize them
to run some analysis. It seems the normalization I perform (decomposing
U+0F00) should be part of NFD... why isn't it?

The same question holds for the (less common)

U+0F02 = U+0F60 U+0F74 U+0F82 U+0F7F
U+0F03 = U+0F60 U+0F74 U+0F82 U+0F14

Thank you,
-- 
Elie


More information about the Indic mailing list