Long standing problem with Vedic tone markers and post-base visarga/anusvara

Shriramana Sharma via Unicode unicode at unicode.org
Sat Dec 21 00:27:53 CST 2019

https://github.com/harfbuzz/harfbuzz/issues/2017 should provide the
context for this.

Ever since the early days of Devanagari Unicode, scholars like me
dealing with Vedic Sanskrit orthography have been experiencing this
problem, but chalked it upto early days and consequent insufficient
support for Vedic sequences. Even now, Vedic support even on the font
side is quite limited, and we also find limitations on the software
side. So I hope it's time to fix them one by one.

The issue I would like to discuss now is as follows:


In Vedic, syllables that carry tone markers – which are mostly
above-base or below-base – often have to take a visarga, which is
always post-base. In this case, the sequence intuitive to native
scholars like me is:


This is because the tone marker indicates the tone of the syllable (or
its vowel) and the visarga is a separate aspirated sound *after* the
syllable to which the tone marker doesn't apply.

In fact, the only reason the visarga sign is analysed as a combining
mark rather than a separate letter is that it is not used in isolation
without a preceding syllable. Otherwise ie linguistically it doesn't
modify the preceding syllable in any way.

Anyhow, the point is that the tone marker should come before the
visarga because it semantically applies to the preceding syllable and
not the visarga.

This is all the more so since in some Vedic contexts (Sama Gana) the
visarga is far separated from the syllable by other syllables like
digits (themselves carrying combining marks) or spacing anusvara, as
seen in examples from my Grantha proposal L2/09-372 p 40.

So the visarga is semantically quite dissociated from the preceding
syllable unlikely the tone marker which is intimately associated with


The same argument is also applicable to the anusvara as it also
represents a nasal sound separate from the preceding syllable. (The
candrabindu OTOH nasalises the preceding syllable itself.)

The above Grantha proposal page also shows an example where an
anusvara is orthographically separated from the preceding syllable by
three characters: a tone marker + avagraha + digit. L2/15-178 shows
that in equivalent contexts of Devanagari the digit 0 is used as a
substitute since the Devanagari anusvara is non-spacing.

All this goes to the dissociation from the syllable of the anusvara –
just like the visarga – compared to tone markers. So to be consistent,
even in case of Devanagari (or such script) where the anusvara is
non-spacing, the sequence when a tone marker is also involved puts the
tone marker first, as mentioned before:



However, even the simplest Vedic sequence (not involving Sama Vedic or
multiple tone marker combinations) like दे॒वेभ्य॑ः throws up a dotted
circle, and one is expected (see developer feedback in that bug
report) to input the visarga before tone markers, hoping the software
is intelligent enough to skip over the visarga (or anusvara) place the
tone marker over the preceding syllable correctly. Why it is necessary
to put the visarga first in input only to have to skip over it in
shaping is beyond me.

So makes sense neither from a linguistic nor technological perspective
to push the tone markers to the end of the syllable. Even the
developers acknowledge that non-spacing marks are normally (ie outside
Indic) input before spacing ones.

However, they say “we can't support that in this particular case
because this is how Microsoft does it and we have to follow suit to
ensure people get the same shaping for the same input”,
notwithstanding the fact that the expectation to put the
visarga/anusvara first is non-sensical as explained above.

So everyone is looking to Microsoft Uniscribe (or whatever its
successor is) to fix things first before they can follow. I figured
that if this is discussed and decided here, everyone can fix it at the
same time.

Shriramana Sharma ஶ்ரீரமணஶர்மா श्रीरमणशर्मा ������������������������

More information about the Unicode mailing list