Long standing problem with Vedic tone markers and post-base visarga/anusvara
James Kass via Unicode
unicode at unicode.org
Thu Jan 2 01:52:55 CST 2020
On 2020-01-02 1:04 AM, Richard Wordingham wrote in a thread deriving
from this one,
> Have you found a definition of the ISCII handling of Vedic characters?
No. It would be helpful. ISCII apparently wasn't really used much. It
would also be helpful to know the encoding order in any legacy ISCII
data using the Vedic characters with respect to VISARGA/ANUSVARA.
Although such legacy data seems unlikely, I'd expect VISARGA/ANUSVARA to
be entered/stored post-syllable.
> I've been looking at Microsoft's specification of Devanagari character
> order. In
>
https://docs.microsoft.com/en-us/typography/script-development/devanagari,
> the consonant syllable ends
>
> [N]+[A] + [< H+[<ZWNJ|ZWJ>] | {M}+[N]+[H]>]+[SM]+[(VD)]
>
> where
> N is nukta
> A is anudatta (U+0952)
> H is halant/virama
> M is matra
> SM is syllable modifier signs
> VD is vedic
>
> "Syllable modifier signs" and "vedic" are not defined. It appears that
> SM includes U+0903 DEVANAGARI SIGN VISARGA.
What action should Microsoft take to satisfy the needs of the user
community?
1. No action, maintain status quo.
2. Swap SM and VD in the specs ordering.
3. Make new category PS (post-syllable) and move VISARGA/ANUSVARA there.
4. ?
What kind of impact would there be on existing data if Microsoft revised
the ordering?
Or should Unicode encode a new character like ZERO-WIDTH INVISIBLE
DOTTED CIRCLE so that users can suppress unwanted and unexpected dotted
circles by adding superfluous characters to the text stream?
> I note that even ग॒ः <U+0917 GA, U+0952 ANUDATTA, U+0903 VISARGA> is
> given a dotted circle by HarfBuzz.
Same on Win 7. And <U+0917 GA, U+0903 VISARGA, U+0952 ANUDATTA> (गः॒)
breaks the mark positioning as expected.
More information about the Unicode
mailing list