Bidi edge cases in Hangul and Indic

Ken Whistler via Unicode unicode at unicode.org
Thu Feb 22 17:32:45 CST 2018



On 2/22/2018 11:39 AM, David Corbett via Unicode wrote:
> For example, after a right-to-left override, the Hangul string 보기 
> (“bogi”) becomes 기보 (“gibo”) in visual order. However, its NFD form is 
> reordered by jamo instead of by syllable; that is, it looks like “igob”. 

Nope. *tilt* The UBA reorders the display order in layout -- not the 
underlying string.

"bogi" is the sequence <1107, 1169, 1100, 1175> in NFD or <BCF4, AE30> 
in NFC.

Because of canonical equivalence, for display of the NFD string, the 
sequence <1107,1169> needs to be mapped onto the same *glyph* as BCF4, 
and the sequence <1100,1175> onto the same *glyph* as AE30.

If you override the normal left-to-right ordering with bidi override 
controls, then the layout order is reversed, but what is actually laid 
out is those two glyphs. So you just reverse the order of the two 
syllables for display, in either case.

You could force display of "igob", but only if you had inserted some 
character in between the conjoining jamos that was preventing their 
equivalence to the syllables, anyway.

> I don’t think it is the intent of the algorithm that canonically 
> equivalent strings display so very differently, but I can’t find any 
> explicit guidance. What should a UBA-conformant renderer do?

The right thing. ;-)

--Ken



More information about the Unicode mailing list