Fallback for Sinhala Consonant Clusters

Harshula via Unicode unicode at unicode.org
Sun Oct 14 09:55:24 CDT 2018

Hi Richard,

1) From a pronunciation perspective, your first and third examples will
be similar. Your second example will be pronounced very differently. I
did some quick testing on Linux and reproduced the behaviour that you

2) Going back more than a decade, the state tables used by some
layout/shaping engines used the same 'virama' rules for North Indian
scripts and Sinhala. This resulted in undesirable *implicit* conjuncts
being created for Sinhala consonant clusters. That then resulted in
undesirable positioning of dependent vowels. e.g.

3) However, what you have observed is an issue with *explicit* conjunct
creation. After the segmentation is completed, the layout/shaping engine
needs to first check if there is a corresponding lookup for the explicit
conjunct, if not, then it needs to remove the ZWJ and redo the
segmentation and lookup(s). Perhaps that is not happening in Harfbuzz.

4) I've been out of the loop for many years, so I have CC'd Ruvan &
Harsha who may already be aware of what you have observed.


On 14/10/18 11:02 am, Richard Wordingham via Unicode wrote:
> Are there fallback rules for Sinhala consonant clusters?  There are
> fallback rules for Devanagari, but I'm not sure if they read across.
> The problem I am seeing is that the Pali syllable 'ndhe' න්‍ධෙ <U+0DB1
> KOMBUVA> is being rendered identically to a hypothetical Sinhalese
> 'nēdha' නේධ <U+0DB1, U+0DDA DIGA KOMBUVA, U+0DB0>,  which in NFD is
> <U+0DB1, U+0DD9, U+0DCA, U+0DB0>, when I use a font that lacks the
> conjunct.  (Most fonts lack the conjunct.)  The Devanagari rules and my
> preference would lead to a fallback rendering as න්ධෙ  (Sinhalese
> 'ndhe'), which is encoded as <U+0DB1 NAYANNA, U+0DCA AL-LAKUNA, U+0DB0
> MAHAPRAANA DAYANNA, U+0DD9 KOMBUVA>.  Is the rendering I am getting
> technically wrong, or is it merely undesirable?
> The ambiguity arises in part because, like the Brahmi script, the
> Sinhala script uses its virama character as a vowel length indicator.
> Missing touching consonants are being rendered almost as though there
> were no ZWJ, but the combination of consonant and al-lakuna is being
> rendered badly.
> Richard.

More information about the Unicode mailing list