Richard Wordingham via Unicode unicode at
Sat Dec 9 08:28:31 CST 2017

Draft 1 of UAX#29 'Unicode Text Segmentation' for Unicode 11.0.0
implies that it might be considered desirable to have a word boundary
in 'aquaφοβία' or a grapheme cluster break in a coding such as <006C,
U+0901 DEVANAGARI SIGN CANDRABINDU> for el candrabindu (l̐), which
should be <006C, U+0310 COMBINING CANDRABINDU> in accordance with the
principle of script separation.  Why are such breaks desirable?

I can understand an argument that these should be tolerated, as an
application could have been designed on the basis that script
boundaries imply word boundaries (not true for Japanese) and that word
boundaries imply grapheme cluster boundaries (not true for Sanskrit,
where they don't even imply character boundaries.)  There are some who
claim that the Laotian consonant place holder is the letter 'x' rather
than the multiplication sign, U+00D7, which does have
Indic_syllabic_category=Consonant_Placeholder. (I trust no-one is
suggesting that there should be grapheme cluster boundary between
U+00D7 with script=common and a non-spacing Lao vowel any more than
there would be with a Lao consonant.)


More information about the Unicode mailing list