Solution for Extended Tamil

James Kass jameskass at code2001.com
Sun Jan 21 09:26:45 CST 2024



On 2024-01-21 1:52 PM, Richard Wordingham via Unicode wrote:
> The Unicode Consortium makes some forays into standardising the
> encoding of text beyond the mere encoding of characters.  Is there yet a
> standard encoding for the first blue word on page 3 of
> https://www.unicode.org/L2/L2010/10379--extended-tamil.pdf  (Document
> L2/10-379)?  The word resembles ப⁴ாவம் <U+0BAA TAMIL LETTER PA, U+2074
> SUPERSCRIPT FOUR, U+0BBE TAMIL VOWEL SIGN AA, U+0BB5 TAMIL LETTER VA,
> U+0BAE LETTER MA, U+0BCD TAMIL SIGN VIRAMA>, but without a dotted
> circle, and is or closely relates to the Sanskrit word 'bh̄āvam'.  I
> would not be surprised at context-sensitive rules for whether the
> sequence should be ended with U+200C ZERO WIDTH NON-JOINER.
>
> One possible solution would be for U+00B2, U+00B3 and U+2074 to be
> treated as nuktas, but that invalidates or creates a confusable for the
> current solution for sequences without a right matra, which is to use
> the order <consonant, vowel, superscript digit>.
Perhaps the simplest solution to this display issue would be to persuade 
the user community to place the superscript digit after the syllable it 
modifies and spell the word like பா⁴வம்.  In other words, expand the 
current solution for sequences without a right matra to all sequences 
<consonant, vowel, superscript digit>. That would eliminate the pesky 
dotted circle.

Failing that, either a specialty font with a zero-width zero-contour 
glyph mapped to the dotted circle character, or a cumbersome reworking 
of existing font display engines to accomodate this unusual construction.





More information about the Unicode mailing list