Long standing problem with Vedic tone markers and post-base visarga/anusvara

Richard Wordingham via Unicode unicode at unicode.org
Thu Jan 2 14:20:34 CST 2020


On Thu, 2 Jan 2020 07:52:55 +0000
James Kass via Unicode <unicode at unicode.org> wrote:

>  > I've been looking at Microsoft's specification of Devanagari
>  > character order.  In
>  >   
> https://docs.microsoft.com/en-us/typography/script-development/devanagari,
>  > the consonant syllable ends
>  >
>  > [N]+[A] + [< H+[<ZWNJ|ZWJ>] | {M}+[N]+[H]>]+[SM]+[(VD)]
>  >
>  > where
>  > N is nukta
>  > A is anudatta (U+0952)
>  > H is halant/virama
>  > M is matra
>  > SM is syllable modifier signs
>  > VD is vedic
>  >
>  > "Syllable modifier signs" and "vedic" are not defined.  It appears
>  > that SM includes U+0903 DEVANAGARI SIGN VISARGA.  
> 
> What action should Microsoft take to satisfy the needs of the user 
> community?
> 1.  No action, maintain status quo.
> 2.  Swap SM and VD in the specs ordering.
> 3.  Make new category PS (post-syllable) and move VISARGA/ANUSVARA
> there.
> 4.  ?

There's a project whose basis I can't find to convert Indian Indic
rendering at least to use the USE.  Now, according to the specification
of the USE, visarga, anusvara and cantillation marks are all classified
as vowel modifiers, and are so ordered relative to one another in the
Indian Indic order: left, top, bottom, right.  So, the problem should
already be solved for Grantha, and, if the plans come to fruition, will
work with a font whose Devanagari script tag is 'dev3'.  However, I may
have overlooked a set of overrides to the USE categorisations.

> What kind of impact would there be on existing data if Microsoft
> revised the ordering?

A good question that *I* can't answer.

> Or should Unicode encode a new character like ZERO-WIDTH INVISIBLE 
> DOTTED CIRCLE so that users can suppress unwanted and unexpected
> dotted circles by adding superfluous characters to the text stream?

It would be useful to be able to suppress inappropriate dotted circles
without disrespecting the character identity of U+25CC.  (Doable
in HarfBuzz, but not in OpenType.)  There's actually been a suggestion
that dotted circles should be applied after global substitutions have
been applied, so as to prevent the overcoming of renderer faults.

On Sat, 21 Dec 2019 11:57:53 +0530
Shriramana Sharma via Unicode <unicode at unicode.org> wrote:

> This is all the more so since in some Vedic contexts (Sama Gana) the
> visarga is far separated from the syllable by other syllables like
> digits (themselves carrying combining marks) or spacing anusvara, as
> seen in examples from my Grantha proposal L2/09-372 p 40.

I presume you referring to the middle picture.  I'm having difficulty
reading it.  Could you please tell us its transcription and encoding.

A minimal change would be to extend the range of base characters to
include digits - I'm surprised matras don't frequently get added to
them.

Richard.



More information about the Unicode mailing list