Re: Aquaφοβία

Mark Davis ☕️ via Unicode unicode at
Sat Dec 9 09:31:06 CST 2017

Some people have been confused by the previous wording, and thought that it
wouldn't be legitimate to break on script boundaries. So we wanted to make
it clear that that was possible, since:

   1. Many implementations of rendering break text into script-runs before
   further processing, and
   2. There are certainly cases where user's expectations are better met
   with breaks on script boundaries*

We thus wanted to make it clear to people that it *is* a legitimate
customization to break on script boundaries.

* Clearly such an approach can't be hard-nosed: an implementation would
need at the very least to handle Common and Inherited specially: not impose
a boundary *because of script* where the SCX value is one of those, either
before or after a break point.

Any suggestions for clarifying language are appreciated.


Mark <>

On Sat, Dec 9, 2017 at 3:28 PM, Richard Wordingham via Unicode <
unicode at> wrote:

> Draft 1 of UAX#29 'Unicode Text Segmentation' for Unicode 11.0.0
> implies that it might be considered desirable to have a word boundary
> in 'aquaφοβία' or a grapheme cluster break in a coding such as <006C,
> U+0901 DEVANAGARI SIGN CANDRABINDU> for el candrabindu (l̐), which
> should be <006C, U+0310 COMBINING CANDRABINDU> in accordance with the
> principle of script separation.  Why are such breaks desirable?
> I can understand an argument that these should be tolerated, as an
> application could have been designed on the basis that script
> boundaries imply word boundaries (not true for Japanese) and that word
> boundaries imply grapheme cluster boundaries (not true for Sanskrit,
> where they don't even imply character boundaries.)  There are some who
> claim that the Laotian consonant place holder is the letter 'x' rather
> than the multiplication sign, U+00D7, which does have
> Indic_syllabic_category=Consonant_Placeholder. (I trust no-one is
> suggesting that there should be grapheme cluster boundary between
> U+00D7 with script=common and a non-spacing Lao vowel any more than
> there would be with a Lao consonant.)
> Richard.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Unicode mailing list