Deterministic sorting impossible for Tibetan with current state

Åke Persson ake.persson at mimer.se
Tue May 12 13:31:02 CDT 2015


Dear Élie,

The combination
- prefix མ, main letter ང, suffix ས
does not exist in the dictionaries referenced from
http://developer.mimer.com/charts/tibetan.htm.

Where did you find it?

Best regards,
Åke Persson

> I'm currently working on Tibetan sorting. It mostly works, except for
> this case:
>
> མངས་
>
> This unicode sequence can be interpreted in two very different ways,
> both valid in terms of Tibetan language:
>
> - prefix མ, main letter ང, suffix ས
> - main letter མ, suffix ང, second suffix ས
>
> Both have their entries in a Tibetan dictionnary: one in the entries for
> letter མ, another (with a different meaning) in the entries for letter ང.
>
> It is thus currently impossible to determine the place of the string
> "མངས་" in a dictionnary (Tibetans guess from the context).
>
> Are there other languages where this undetermination happens? Did they
> solve that problem? If not, what I propose is a new character,
> invisible, with the meaning "previous letter is the main letter in case
> of indetermination". This would, of course, not solve the problem
> entirely, as the string "མངས་" would still be undetermined, but at least
> it would be possible for users to force its determination.
>
> What do you think?
>
> Thank you,
> -- 
> Elie Roux
>
> _______________________________________________
> Indic mailing list
> Indic at unicode.org
> http://unicode.org/mailman/listinfo/indic
> 



More information about the Indic mailing list