Solution for Extended Tamil

Sun Jan 21 07:52:18 CST 2024

The Unicode Consortium makes some forays into standardising the
encoding of text beyond the mere encoding of characters.  Is there yet a
standard encoding for the first blue word on page 3 of
https://www.unicode.org/L2/L2010/10379--extended-tamil.pdf (Document
L2/10-379)?  The word resembles ப⁴ாவம் <U+0BAA TAMIL LETTER PA, U+2074
SUPERSCRIPT FOUR, U+0BBE TAMIL VOWEL SIGN AA, U+0BB5 TAMIL LETTER VA,
U+0BAE LETTER MA, U+0BCD TAMIL SIGN VIRAMA>, but without a dotted
circle, and is or closely relates to the Sanskrit word 'bh̄āvam'.  I
would not be surprised at context-sensitive rules for whether the
sequence should be ended with U+200C ZERO WIDTH NON-JOINER.

One possible solution would be for U+00B2, U+00B3 and U+2074 to be
treated as nuktas, but that invalidates or creates a confusable for the
current solution for sequences without a right matra, which is to use
the order <consonant, vowel, superscript digit>.

Another possible solution is to define a special visual rearrangement
for the sequences <consonant, (U+0BBE|U+0BCA|U+0BCB|U+0BCC|U+0BD7),
superscript digit> and their canonical equivalents.

Is it perhaps the case that the word I mentioned can only be encoded
using the PUA?

Richard.