[EXTERNAL] Is Devanagari ल्लाँ ambiguous?

Richard Wordingham richard.wordingham at ntlworld.com
Tue May 5 19:19:27 CDT 2020

On Tue, 5 May 2020 22:57:14 +0000
Andrew Glass <Andrew.Glass at microsoft.com> wrote:

> Here is an excerpt from Whitney's Sanskrit Grammar page 69:
> [A close up of a newspaper  Description automatically generated]
> Whitney, William Dwight. 1889. A Sanskrit grammar, including both the
> classical language, and the older dialects, of Veda and Brahmana.
> Bibliothek indogermanischer Grammatiken, Band II. Leipzig: Breitkopf
> and Härtel.

And MacDonnell's 1886 revision of Max Mueller's 'A Sanskrit Grammar for
Beginners' gives on p21 an example of y̐yā with the candrabindu on the
far left. (The conjunct uses a half form.)  Unfortunately, they don't
actually answer the question of whether the placement of candrabindu is
significant, though they support my feeling that it is.
> The version with explicit virama is nice because it shows how the
> ambiguity can be avoided and gives us a clue to the better encoding.
> So I would encode these as follows:
> As a single cluster with candrabindu applied to the first l:
> 0932 094D 0901 0932 094B
> ल्ँलो
> This cluster is supported in Nirmala UI: [A drawing of a face
> Description automatically generated]

According to
https://docs.microsoft.com/en-us/typography/script-development/devanagari ,
that's two syllables, and that's how HarfBuzz is currently rendering
it.  It seems I'll have to raise a bug report against HarfBuzz - unless
it's changed fairly recently.

If I treat the candrabindu as a consonant modifier (i.e. as a type of
nukta), which is what the grammarians say it is, and encoded it before
the virama, I get a dotted circle out of HarfBuzz.

> With explicit virama and candrabindu applied to the first l:
> 0932 094D 0901 0020 0932 094B
> ल्ँ लो

> Which leaves the vowel marked form as you have given it:
> 0932 094D 0932 093E 0901
> ल्लाँ

And this last one is the only encoding allowed by TUS 13 Section 12.1

My Unicode feedback and HarfBuzz bug report should make reference to the
thread 'Sanskrit nasalised L' including
https://www.unicode.org/mail-arch/unicode-ml/y2011-m06/0144.html .
Thank you for your help.

One thing I have established is that there are rendering systems that
support candrabindu within the consonant stack - but not where I
expected it!  (This relates to the issue of where U+0D81 SINHALA
SIGN CANDRABINDU appears in the encoding of a word, which I've raised


More information about the Unicode mailing list