Solution for Extended Tamil

James Kass jameskass at code2001.com
Mon Jan 22 12:23:03 CST 2024



On 2024-01-22 11:01 AM, Shriramana Sharma via Unicode wrote:
> Please see the original attestations. I have noted that they always 
> put the digit immediately after the consonant.
>
> There is not much meaning IMO in quoting online attestations or search 
> results because when it doesn't display properly and throws a dotted 
> circle, they will adjust it so that it doesn't display such junk. 
> Speaking as one of the authors of a de facto Unicode-based 
> transliteration scheme from Devanagari to Tamil which seems to be 
> widely used (but we can't get assured statistics).

Quoting from 
https://en.wiktionary.org/wiki/Module:sa-convert/testcases/Tamil :

"in most forms of Extended Tamil (including the Gita book mentioned 
previously running to almost 420,000 copies) the diacritics are placed 
between the consonant and any vowel signs placed to the right".

Maybe not always, for example : "நாபி⁴ஜாநாதி" -- would the superscript 
digit be expected to break the ligature here?

As we know, when typing Tamil on a mechanical typewriter, for example, 
U+0BC6 TAMIL VOWEL SIGN E was always typed before the consonant.  But in 
the standardized computer encoding for Tamil, U+0BC6 is always entered 
after the consonant.  In both cases, the display properly shows the 
vowel sign on the left of the consonant.

The original question here was about a standardized encoding order for 
Extended Tamil, and the user community has apparently already chosen a 
/de facto/ standardization.  And the results are legible.

Placing the superscript digits next to the consonants instead of at the 
end of the syllable appears to be a display issue.  But superscript 
digits are "number, other" and "not reordered"; so the rendering system 
won't automatically treat the digits as marks. Encoding clones of the 
superscript digits to be treated as marks might not be practical.  And, 
after all, the character identity of those superscript digits is that 
they are superscript digits.

Has any effort been made to use OpenType to get the desired display?  
Classifying the superscripts digits as "marks" in the GDEF (glyph 
definition) table and then using GPOS (glyph positioning) for the 
desired placement?  Or has the user community accepted the plain-text 
legibility of the /de facto/ standard encoding order and reconciled with 
the fact that not all published books can be exactly rendered in plain-text?




More information about the Unicode mailing list