Philippe Verdy via Unicode
unicode at unicode.org
Sat Dec 9 09:08:22 CST 2017
2017-12-09 15:28 GMT+01:00 Richard Wordingham via Unicode <
unicode at unicode.org>:
> Draft 1 of UAX#29 'Unicode Text Segmentation' for Unicode 11.0.0
> implies that it might be considered desirable to have a word boundary
> in 'aquaφοβία' or a grapheme cluster break in a coding such as <006C,
> U+0901 DEVANAGARI SIGN CANDRABINDU> for el candrabindu (l̐), which
> should be <006C, U+0310 COMBINING CANDRABINDU> in accordance with the
> principle of script separation. Why are such breaks desirable?
I don't understand why one would encode a DEVANAGARI SIGN in the middle of
a Greek word to mean it implies a word boundary in Greek !?!
> There are some who
> claim that the Laotian consonant place holder is the letter 'x' rather
> than the multiplication sign, U+00D7, which does have
> Indic_syllabic_category=Consonant_Placeholder. (I trust no-one is
> suggesting that there should be grapheme cluster boundary between
> U+00D7 with script=common and a non-spacing Lao vowel any more than
> there would be with a Lao consonant.)
Here again the multiplication sign has nothing to do with an Indic
consonnant. May be it has been used like this in some texts but this look
more like a tweak. If one needs a consonnant holder propose to encode an
"empty" letter (like in Hangul or in Arabic), possibly with variant forms
(e.g. changing between a circle, dotted circle, cross, or horizontal joiner
on the hanging baseline for Devenagari and similar scripts).
The usual base letter placeholder for combining diacritics is usually a
whitespace (preferably NBSP, not SPACE) or the dotted circle symbol, but
not a mathematical symbol which is used also within math formulas with
variable names using common letters or even words.
The multiplication sign used in the UTS standard was chosen because it
normally does not occur within words, and only for defining the breaking
rules (to indicate that NO break is allowed here, i.e. the opposite of what
you describe): it is notational only and is clearly not meant to combine
with what follows: if you encode the multiplication sign then an Indic
diacritic, we expect to see the separate multipliation sign (with break
opportunities on both sides) then a dotted circle glyph used for defective
grapheme clusters to hold the diacritic.
So for me Indic_syllabic_category=Consonant_Placeholder is wrong: for such
use of the cross, an Indic (or generic) consonant placeholder should better
be encoded and used and that property may be added on it, and removed from
the multiplication sign.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode