Deterministic sorting impossible for Tibetan with current state

Élie Roux elie.roux at telecom-bretagne.eu
Tue May 12 08:28:04 CDT 2015


Dear all,

I'm not sure I'm sending a mail to the correct list, please tell me if
I'm not.

I'm currently working on Tibetan sorting. It mostly works, except for
this case:

མངས་

This unicode sequence can be interpreted in two very different ways,
both valid in terms of Tibetan language:

- prefix མ, main letter ང, suffix ས
- main letter མ, suffix ང, second suffix ས

Both have their entries in a Tibetan dictionnary: one in the entries for
letter མ, another (with a different meaning) in the entries for letter ང.

It is thus currently impossible to determine the place of the string
"མངས་" in a dictionnary (Tibetans guess from the context).

Are there other languages where this undetermination happens? Did they
solve that problem? If not, what I propose is a new character,
invisible, with the meaning "previous letter is the main letter in case
of indetermination". This would, of course, not solve the problem
entirely, as the string "མངས་" would still be undetermined, but at least
it would be possible for users to force its determination.

What do you think?

Thank you,
-- 
Elie Roux



More information about the Indic mailing list