Why is TAMIL SIGN VIRAMA not Alphabetic?

SundaraRaman R via Indic indic at unicode.org
Sun Apr 29 08:57:40 CDT 2018


Hi,

In languages like Ruby or Java
(https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isAlphabetic(int)),
functions to check if a character is alphabetic do that by looking for
the 'Alphabetic'  property (defined true if it's in one of the L
categories, or Nl, or has 'Other_Alphabetic' property). When parsing
Tamil text, this works out well for independent vowels and consonants
(which are in Lo), and for most dependent signs (which are in Mc or Mn
but have the 'Other_Alphabetic' property), but the very common VIRAMA
is neither in Lo nor has 'Other_Alphabetic', and so leads to
concluding any string containing it to be non-alphabetic.

This doesn't make sense to me since the Virama  “◌்” as much of an
alphabetic character as any of the "Dependent Vowel" characters which
have been given the 'Other_Alphabetic' property. Is there a rationale
behind this difference, or is it an oversight to be corrected?

Thanks,
Sundar



More information about the Indic mailing list