Richard Wordingham via Unicode unicode at
Sat Dec 9 10:22:47 CST 2017

On Sat, 9 Dec 2017 16:08:22 +0100
Philippe Verdy <verdy_p at> wrote:

> 2017-12-09 15:28 GMT+01:00 Richard Wordingham via Unicode <
> unicode at>:  
> > Draft 1 of UAX#29 'Unicode Text Segmentation' for Unicode 11.0.0
> > implies that it might be considered desirable to have a word
> > boundary in 'aquaφοβία' or a grapheme cluster break in a coding
> > such as <006C, U+0901 DEVANAGARI SIGN CANDRABINDU> for el
> > candrabindu (l̐), which should be <006C, U+0310 COMBINING
> > CANDRABINDU> in accordance with the principle of script
> > CANDRABINDU> separation.  Why are such breaks desirable?
> >  
> I don't understand why one would encode a DEVANAGARI SIGN in the
> middle of a Greek word to mean it implies a word boundary in Greek !?!

The two examples given are "aquaφοβία" and "Aि".  The first switches
from Latin to Greek and the second is a Latin letter with a Devanagari
mark. However, there is a pre-Unicode tradition of using el with
candrabindu when writing Sanskrit in the the Roman alphabet, which is
why there is U+0310.

> > There are some who
> > claim that the Laotian consonant place holder is the letter 'x'
> > rather than the multiplication sign, U+00D7, which does have
> > Indic_syllabic_category=Consonant_Placeholder. (I trust no-one is
> > suggesting that there should be grapheme cluster boundary between
> > U+00D7 with script=common and a non-spacing Lao vowel any more than
> > there would be with a Lao consonant.)
> >  
> Here again the multiplication sign has nothing to do with an Indic
> consonnant. May be it has been used like this in some texts but this
> look more like a tweak.

Whatever its origin, it seems well established in Laos, and I've seen
it used for the Tai Tham script as well as for the Lao script. Try
searching for images of Lao vowels in French. Googling in English found
plenty of examples, and the teaching book shown at supports
the case nicely.  I've also seen it used for Khmer, but not to the
extent that I can argue that it is well-established in Cambodia.

The Khmer example was produced using a typewriter and apparently a
felt-tipped pen, so unsurprisingly the vowel bearer was clearly a
typewritten letter 'x'.

> If one needs a consonnant holder propose to
> encode an "empty" letter (like in Hangul or in Arabic), possibly with
> variant forms (e.g. changing between a circle, dotted circle, cross,
> or horizontal joiner on the hanging baseline for Devenagari and
> similar scripts).

Propose a disunification if you like. The competing tradition is to
use LAO LETTER KO, and a Lao-English dictionary from Thailand uses a
grey LAO LETTER O, following the Thai tradition of using the Thai letter
for /ʔ/, which serves as the 'empty' letter for Pali and Sanskrit.
Remember that a proposal for an invisible letter for Indic was rejected.

> The usual base letter placeholder for combining diacritics is usually
> a whitespace (preferably NBSP, not SPACE) or the dotted circle
> symbol, but not a mathematical symbol which is used also within math
> formulas with variable names using common letters or even words.

> The multiplication sign used in the UTS standard was chosen because it
> normally does not occur within words,...

and has nothing to do with the usage of U+00D7 as a consonant


More information about the Unicode mailing list