Counting Devanagari Aksharas

Richard Wordingham via Unicode unicode at
Thu Apr 20 02:49:49 CDT 2017

I was offered the following reply:

> To my knowledge except in Tamil script vowel less consonants in
> written form aren't considered as separate "akshara"s in native
> terminology.

Word-finally they seem to be being treated as such.  To be more
precise, a final cluster of one or more consonants marked as having no
vowel is - Sanskrit has a few word-final clusters.

> However for text shaping purposes they will surely have
> to be considered as separate orthographic syllables in Unicode
> terminology since in word end position they can sometimes carry svara
> markers.

The complication comes word internally.  My understanding is that
phonetically syllable-final consonants in non-Indic words in
non-Indic languages have a tendency not to be included in an akshara
along with the start of the next syllable.  However, that tendency is
more evident in scripts other than Devanagari; Devanagari has developed
in the context of Indic languages.

Renderers' syllable-recognition algorithms will naturally treat
word-final devowelled sequences as separate units, rather than
associate them with the previous implicit or explict vowel.

Burmese is a good example of what can happen with a non-Indic language;
in native words, phonetic syllabic boundaries tend to be orthographic
syllable boundaries.

Text-shaping engines like Microsoft's Uniscribe are more complicated.
For scripts with a virama, they seem to assume that the virama may be
a combining operator, and wait for data from the font to decide how
many clusters to form.

One test is the insertion of white spaces in a word when it is stretched
out.  Of course, that test can only be applied where human decisions
are involved - otherwise we are just looking at what dominant
renderers are actually doing, rather than looking at what they ought
to be doing.


More information about the Unicode mailing list