Bidi edge cases in Hangul and Indic
Ken Whistler via Unicode
unicode at unicode.org
Thu Feb 22 17:32:45 CST 2018
On 2/22/2018 11:39 AM, David Corbett via Unicode wrote:
> For example, after a right-to-left override, the Hangul string 보기
> (“bogi”) becomes 기보 (“gibo”) in visual order. However, its NFD form is
> reordered by jamo instead of by syllable; that is, it looks like “igob”.
Nope. *tilt* The UBA reorders the display order in layout -- not the
underlying string.
"bogi" is the sequence <1107, 1169, 1100, 1175> in NFD or <BCF4, AE30>
in NFC.
Because of canonical equivalence, for display of the NFD string, the
sequence <1107,1169> needs to be mapped onto the same *glyph* as BCF4,
and the sequence <1100,1175> onto the same *glyph* as AE30.
If you override the normal left-to-right ordering with bidi override
controls, then the layout order is reversed, but what is actually laid
out is those two glyphs. So you just reverse the order of the two
syllables for display, in either case.
You could force display of "igob", but only if you had inserted some
character in between the conjoining jamos that was preventing their
equivalence to the syllables, anyway.
> I don’t think it is the intent of the algorithm that canonically
> equivalent strings display so very differently, but I can’t find any
> explicit guidance. What should a UBA-conformant renderer do?
The right thing. ;-)
--Ken
More information about the Unicode
mailing list