Why is TAMIL SIGN VIRAMA (pulli) not Alphabetic?
    Asmus Freytag via Unicode 
    unicode at unicode.org
       
    Mon May 28 23:44:11 CDT 2018
    
    
  
One of the general principles is that combining marks inherit the 
property of their base character.
Normally, "inherited" should be the only property value for combining marks.
There have been some deviations from this over the years, for various 
reasons, and there are some properties (such as general category) where 
it is necessary to recognize the character as combining, but the general 
principle still holds.
Therefore, if you are trying to see whether a string is alphabetic, 
combining marks should be "transparent" to such an algorithm.
A./
On 5/28/2018 9:23 PM, Martin J. Dürst via Unicode wrote:
> Hello Sundar,
>
> On 2018/05/28 04:27, SundaraRaman R via Unicode wrote:
>> Hi,
>>
>> In languages like Ruby or Java
>> (https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isAlphabetic(int)), 
>>
>> functions to check if a character is alphabetic do that by looking for
>> the 'Alphabetic'  property (defined true if it's in one of the L
>> categories, or Nl, or has 'Other_Alphabetic' property). When parsing
>> Tamil text, this works out well for independent vowels and consonants
>> (which are in Lo), and for most dependent signs (which are in Mc or Mn
>> but have the 'Other_Alphabetic' property), but the very common pulli 
>> (VIRAMA)
>> is neither in Lo nor has 'Other_Alphabetic', and so leads to
>> concluding any string containing it to be non-alphabetic.
>>
>> This doesn't make sense to me since the Virama  “◌்” as much of an
>> alphabetic character as any of the "Dependent Vowel" characters which
>> have been given the 'Other_Alphabetic' property. Is there a rationale
>> behind this difference, or is it an oversight to be corrected?
>
> I suggest submitting an error report via 
> https://www.unicode.org/reporting.html. I haven't studied the issue in 
> detail (sorry, just no time this week), but it sounds reasonable to 
> give the VIRAMA the 'Other_Alphabetic' property.
>
> I'd recommend to mention examples other than Tamil in your report 
> (assuming they exist).
>
> BTW, what's the method you are using in Ruby? If there's a problem in 
> Ruby (which I don't think; it's just using Unicode data), then please 
> make a bug report at https://bugs.ruby-lang.org/projects/ruby-trunk, I 
> should be able to follow up on that.
>
> Regards,   Martin.
>
    
    
More information about the Unicode
mailing list