Bidi edge cases in Hangul and Indic

Thu Feb 22 13:39:33 CST 2018

Although the Unicode Bidirectional Algorithm clearly defines how to reorder
characters in memory, I don’t understand precisely what it means to display
one character after another once they’ve been reordered; specifically, when
bidi reordering changes the number of user-perceived characters.

For example, after a right-to-left override, the Hangul string 보기 (“bogi”)
becomes 기보 (“gibo”) in visual order. However, its NFD form is reordered by
jamo instead of by syllable; that is, it looks like “igob”. I don’t think
it is the intent of the algorithm that canonically equivalent strings
display so very differently, but I can’t find any explicit guidance. What
should a UBA-conformant renderer do?

Another unclear case is Indic clusters. षिक् is unambiguously two clusters,
but after an RLO, and after following rule L3 to put combining marks after
their bases, it looks like one cluster: क्षि. If Devanagari were actually
written right-to-left, I would expect it to stay as two clusters: क्‌षि.
Does the UBA prefer one rendering over the other, or is this outside its
scope?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180222/35e601c5/attachment.html>