Long standing problem with Vedic tone markers and post-base visarga/anusvara

James Kass via Unicode unicode at unicode.org
Thu Jan 2 01:52:55 CST 2020


On 2020-01-02 1:04 AM, Richard Wordingham wrote in a thread deriving 
from this one,

 > Have you found a definition of the ISCII handling of Vedic characters?

No.  It would be helpful.  ISCII apparently wasn't really used much.  It 
would also be helpful to know the encoding order in any legacy ISCII 
data using the Vedic characters with respect to VISARGA/ANUSVARA.  
Although such legacy data seems unlikely, I'd expect VISARGA/ANUSVARA to 
be entered/stored post-syllable.

 > I've been looking at Microsoft's specification of Devanagari character
 > order.  In
 > 
https://docs.microsoft.com/en-us/typography/script-development/devanagari,
 > the consonant syllable ends
 >
 > [N]+[A] + [< H+[<ZWNJ|ZWJ>] | {M}+[N]+[H]>]+[SM]+[(VD)]
 >
 > where
 > N is nukta
 > A is anudatta (U+0952)
 > H is halant/virama
 > M is matra
 > SM is syllable modifier signs
 > VD is vedic
 >
 > "Syllable modifier signs" and "vedic" are not defined.  It appears that
 > SM includes U+0903 DEVANAGARI SIGN VISARGA.

What action should Microsoft take to satisfy the needs of the user 
community?
1.  No action, maintain status quo.
2.  Swap SM and VD in the specs ordering.
3.  Make new category PS (post-syllable) and move VISARGA/ANUSVARA there.
4.  ?

What kind of impact would there be on existing data if Microsoft revised 
the ordering?

Or should Unicode encode a new character like ZERO-WIDTH INVISIBLE 
DOTTED CIRCLE so that users can suppress unwanted and unexpected dotted 
circles by adding superfluous characters to the text stream?

 > I note that even ग॒ः <U+0917 GA, U+0952 ANUDATTA, U+0903 VISARGA> is
 > given a dotted circle by HarfBuzz.

Same on Win 7.  And <U+0917 GA, U+0903 VISARGA, U+0952 ANUDATTA> (गः॒) 
breaks the mark positioning as expected.



More information about the Unicode mailing list