Do `Grapheme_Extend` characters only apply to `Grapheme_Base`?

Richard Wordingham richard.wordingham at ntlworld.com
Thu Apr 24 18:12:22 CDT 2014


On Thu, 24 Apr 2014 23:07:58 +0200
Mathias Bynens <mathias at qiwi.be> wrote:

> I realize reversing a string has nothing to do with text segmentation
> – but ignoring grapheme extenders leads to unexpected results (since
> after reversing the code points, the grapheme extender might extend
> the wrong character):
> https://github.com/mathiasbynens/esrever/issues/5

Actually, it has a lot to do with text segmentation - you need to work
out what are really thought of as the characters.  שָׁלוֹם is a nice
illustration of the problems.  Is reversing twice to yield the string
you first started with?  Is reversing three times to give the same
result as reversing once?  What does reversing a Hangul syllable do?
Canonically equivalence should be preserved!  Should renderability be
preserved?  What does Thai เกราะ /krɔ̀ʔ/ <U+0E40, U+0E01, U+0E23,
U+0E32, U+0E30> reverse to?  /ʔɔ̀rk/ is unpronounceable in Thai, and if
it were it would be written อ็อรก <U+0E2D, U+0E47, U+0E2D, U+0E23,
U+0E01>.  Thai เพลา <U+0E40, U+0E1E, U+0E25, U+0E32> is the spelling
of two unrelated words, pronounced /pʰlaw/ and /pheː laː/ respectively.

Richard. 




More information about the Unicode mailing list