ignoring characters in collation (for Tibetan)

Élie Roux elie.roux at telecom-bretagne.eu
Mon Jun 8 05:54:26 CDT 2015


Dear all,

When sorting, Tibetan, 0F35 and 0F37 should be completely ignored by the
collation algorithm.

An example with rules for Dzongkha in CLDR:

- line 14 there is འདན<འདབ<འདམ
- I want to sort འད༵བ, I want it to be equal weight to འདབ, as 0F37
should be ignored
- when sorting ད འདན འད༵བ འདམ ན འ ཡ (correct order) I get ད འདན འདམ ན འ
འད༵བ ཡ (not correct)

so it seems འད༵བ is not treated as equal to འདབ. Is there any way to
specify this with the current spec/implementation? If I have to
duplicate all collation elements to give them a 0F35/0F37 variant, the
table will just explode (it's already huge).

Thank you very much,
-- 
Elie


More information about the Indic mailing list