Composition / Decomposition of Tibetan oM (0F00)
Élie Roux via Indic
indic at unicode.org
Sun Mar 18 05:23:16 CDT 2018
Dear All,
I am wondering why U+0F00 is not indicated as being composed of
U+0F68 U+0F7C U+0F7E
which is what a native person would think? Is there supposed to be a
semantic difference between the two (U+0F00 and this decomposition)?
When I see something in a manuscript, how can I know if I should input
U+0F00 or the decomposition?
My experience is that different input systems will produce one or the
other so when I'm working on a Tibetan corpus I have to normalize them
to run some analysis. It seems the normalization I perform (decomposing
U+0F00) should be part of NFD... why isn't it?
The same question holds for the (less common)
U+0F02 = U+0F60 U+0F74 U+0F82 U+0F7F
U+0F03 = U+0F60 U+0F74 U+0F82 U+0F14
Thank you,
--
Elie
More information about the Indic
mailing list