One encoding per shape (was Re: Long standing problem with Vedic tone markers and post-base visarga/anusvara)

Richard Wordingham via Unicode unicode at unicode.org
Wed Jan 1 19:04:27 CST 2020


On Wed, 1 Jan 2020 23:09:49 +0000
James Kass via Unicode <unicode at unicode.org> wrote:

> On 2020-01-01 8:11 PM, James Kass wrote:
> > It’s too bad that ISCII didn’t accomodate the needs of Vedic
> > Sanskrit, but here we are.  
> 
> Sorry, that might be wrong to say.  It's possible that it's Unicode's 
> adaptation of ISCII that hinders Vedic Sanskrit.

Have you found a definition of the ISCII handling of Vedic characters?

The problem lies in Unicode's failure to standardise the encoding of
Devanagari text.  But for the consistent failure to include a
standardisation of text in a script in TUS, one might wonder if the
original idea was to duck the issue by resorting to canonical
equivalence.

I've been looking at Microsoft's specification of Devanagari character
order.  In
https://docs.microsoft.com/en-us/typography/script-development/devanagari,
the consonant syllable ends

[N]+[A] + [< H+[<ZWNJ|ZWJ>] | {M}+[N]+[H]>]+[SM]+[(VD)]

where
N is nukta
A is anudatta (U+0952)
H is halant/virama
M is matra
SM is syllable modifier signs
VD is vedic

"Syllable modifier signs" and "vedic" are not defined.  It appears that
SM includes U+0903 DEVANAGARI SIGN VISARGA.

I note that even ग॒ः <U+0917 GA, U+0952 ANUDATTA, U+0903 VISARGA> is
given a dotted circle by HarfBuzz.  Now, this might not be an entirely
fair test; I suspect anudatta is assigned this position because
originally the Sindhi implosives were encoded as consonant plus nukta
and anudatta, though rendering still fails with HarfBuzz when nukta is
inserted (ग़॒ः).

Richard.




More information about the Unicode mailing list