Solution for Extended Tamil

Tue Jan 23 22:50:58 CST 2024

On Sun, 21 Jan 2024 13:52:18 +0000
Richard Wordingham via Unicode <unicode at corp.unicode.org> wrote:

> The Unicode Consortium makes some forays into standardising the
> encoding of text beyond the mere encoding of characters.  Is there
> yet a standard encoding for the first blue word on page 3 of
> https://www.unicode.org/L2/L2010/10379--extended-tamil.pdf (Document
> L2/10-379)?  The word resembles ப⁴ாவம் <U+0BAA TAMIL LETTER PA, U+2074
> SUPERSCRIPT FOUR, U+0BBE TAMIL VOWEL SIGN AA, U+0BB5 TAMIL LETTER VA,
> U+0BAE LETTER MA, U+0BCD TAMIL SIGN VIRAMA>, but without a dotted
> circle, and is or closely relates to the Sanskrit word 'bh̄āvam'.  I
> would not be surprised at context-sensitive rules for whether the
> sequence should be ended with U+200C ZERO WIDTH NON-JOINER.
> 
> One possible solution would be for U+00B2, U+00B3 and U+2074 to be
> treated as nuktas, but that invalidates or creates a confusable for
> the current solution for sequences without a right matra, which is to
> use the order <consonant, vowel, superscript digit>.

There doesn't appear to be any Unicode progress beyond L2-10/440
wherein the South Asian subcommitted opined, in that report,

"Indic rendering engines, for example, will need to know that the
superscript numbers should be treated as diacritics (that is, in the
nukta class)."

There was a request for comments from those with implementations, of
which one response predating the report made it to the document
register, L2/10-435, from R. Radhakrishnan, Muthu Nedumaran, which
exhibited elegant rendering using AAT.  (I fear that that's no more
conclusive than finding a Graphite font that can render the sequences.)

Can we honestly claim that subcommittee report as a finding of fact by
the UTC?  If we can, that would declare that the correct placement of
the superscript digit is immediately after the consonant.

Richard.

> 
> Another possible solution is to define a special visual rearrangement
> for the sequences <consonant, (U+0BBE|U+0BCA|U+0BCB|U+0BCC|U+0BD7),
> superscript digit> and their canonical equivalents.
> 
> Is it perhaps the case that the word I mentioned can only be encoded
> using the PUA?
> 
> Richard.
>