Why is TAMIL SIGN VIRAMA (pulli) not Alphabetic?

Martin J. Dürst via Unicode unicode at unicode.org
Mon May 28 23:23:13 CDT 2018


Hello Sundar,

On 2018/05/28 04:27, SundaraRaman R via Unicode wrote:
> Hi,
> 
> In languages like Ruby or Java
> (https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isAlphabetic(int)),
> functions to check if a character is alphabetic do that by looking for
> the 'Alphabetic'  property (defined true if it's in one of the L
> categories, or Nl, or has 'Other_Alphabetic' property). When parsing
> Tamil text, this works out well for independent vowels and consonants
> (which are in Lo), and for most dependent signs (which are in Mc or Mn
> but have the 'Other_Alphabetic' property), but the very common pulli (VIRAMA)
> is neither in Lo nor has 'Other_Alphabetic', and so leads to
> concluding any string containing it to be non-alphabetic.
> 
> This doesn't make sense to me since the Virama  “◌்” as much of an
> alphabetic character as any of the "Dependent Vowel" characters which
> have been given the 'Other_Alphabetic' property. Is there a rationale
> behind this difference, or is it an oversight to be corrected?

I suggest submitting an error report via 
https://www.unicode.org/reporting.html. I haven't studied the issue in 
detail (sorry, just no time this week), but it sounds reasonable to give 
the VIRAMA the 'Other_Alphabetic' property.

I'd recommend to mention examples other than Tamil in your report 
(assuming they exist).

BTW, what's the method you are using in Ruby? If there's a problem in 
Ruby (which I don't think; it's just using Unicode data), then please 
make a bug report at https://bugs.ruby-lang.org/projects/ruby-trunk, I 
should be able to follow up on that.

Regards,   Martin.


More information about the Unicode mailing list