Bidi edge cases in Hangul and Indic
David Corbett via Unicode
unicode at unicode.org
Thu Feb 22 13:39:33 CST 2018
Although the Unicode Bidirectional Algorithm clearly defines how to reorder
characters in memory, I don’t understand precisely what it means to display
one character after another once they’ve been reordered; specifically, when
bidi reordering changes the number of user-perceived characters.
For example, after a right-to-left override, the Hangul string 보기 (“bogi”)
becomes 기보 (“gibo”) in visual order. However, its NFD form is reordered by
jamo instead of by syllable; that is, it looks like “igob”. I don’t think
it is the intent of the algorithm that canonically equivalent strings
display so very differently, but I can’t find any explicit guidance. What
should a UBA-conformant renderer do?
Another unclear case is Indic clusters. षिक् is unambiguously two clusters,
but after an RLO, and after following rule L3 to put combining marks after
their bases, it looks like one cluster: क्षि. If Devanagari were actually
written right-to-left, I would expect it to stay as two clusters: क्षि.
Does the UBA prefer one rendering over the other, or is this outside its
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode