Counting Devanagari Aksharas
Richard Wordingham via Unicode
unicode at unicode.org
Thu Apr 20 02:49:49 CDT 2017
I was offered the following reply:
> To my knowledge except in Tamil script vowel less consonants in
> written form aren't considered as separate "akshara"s in native
Word-finally they seem to be being treated as such. To be more
precise, a final cluster of one or more consonants marked as having no
vowel is - Sanskrit has a few word-final clusters.
> However for text shaping purposes they will surely have
> to be considered as separate orthographic syllables in Unicode
> terminology since in word end position they can sometimes carry svara
The complication comes word internally. My understanding is that
phonetically syllable-final consonants in non-Indic words in
non-Indic languages have a tendency not to be included in an akshara
along with the start of the next syllable. However, that tendency is
more evident in scripts other than Devanagari; Devanagari has developed
in the context of Indic languages.
Renderers' syllable-recognition algorithms will naturally treat
word-final devowelled sequences as separate units, rather than
associate them with the previous implicit or explict vowel.
Burmese is a good example of what can happen with a non-Indic language;
in native words, phonetic syllabic boundaries tend to be orthographic
Text-shaping engines like Microsoft's Uniscribe are more complicated.
For scripts with a virama, they seem to assume that the virama may be
a combining operator, and wait for data from the font to decide how
many clusters to form.
One test is the insertion of white spaces in a word when it is stretched
out. Of course, that test can only be applied where human decisions
are involved - otherwise we are just looking at what dominant
renderers are actually doing, rather than looking at what they ought
to be doing.
More information about the Unicode