"A Programmer's Introduction to Unicode"

Richard Wordingham richard.wordingham at ntlworld.com
Sun Mar 12 15:10:22 CDT 2017


On Sun, 12 Mar 2017 20:02:28 +0100
"Janusz S. Bien" <jsbien at mimuw.edu.pl> wrote:

> If the basic notion has to be referred in a cumbersome way as  
> "extended grapheme cluster" then it is easier to talk about "Unicode  
> characters" despite the fact that they have a rather loose relation
> to real-life/user-perceived characters.

The notion that extended grapheme clusters corresponds to
user-perceived characters is also rather dodgy.  Whereas it may work
for French, it is getting very dubious by the time one adds Hebrew
cantillation marks or Vedic accentuation.  The Thais revolted when
their preposed vowels were joined with the following consonant in the
same extended grapheme cluster, and Unicode had to revoke that union.

Richard.


More information about the Unicode mailing list