Re: Unicode is universal, so how come that universality doesn’t apply to digits?

Karl Williamson public at khwilliamson.com
Thu Mar 18 21:12:59 CDT 2021


On 12/16/20 2:32 PM, Bill Poser via Unicode wrote:
> It seems to me that, in spite of the superficial similarity of the way 
> numbers are written in many languages, this is NOT, in general, a matter 
> of encoding conversion or even transliteration but rather one of 
> translation and therefore not part of Unicode for the same reason that 
> Unicode does not handle the translation of text from, say, Japanese to 
> English.
> 
> There is, actually, a library, which I have written, that handles 
> conversions between Unicode strings and integers for most systems of 
> writing numbers. (I have yet to update it to handle some of the more 
> recently encoded systems.) It is a C library which also has a TCL binding:
> 
> http://billposer.org/Software/libuninum.html 
> <http://billposer.org/Software/libuninum.html>
> 
> It handles a number of systems that require algorithms rather different 
> from that of atoi/strtol.
> 
> Bill
> 

Another tool option is that recent versions of Perl come with the 
function num() in the Unicode::UCD module.  If its input is a string 
consisting of a single character, and that character has a defined 
numeric value, it will return that value, converted to floating point if 
necessary; it returns undef for characters without a numeric value

If called with a string consisting entirely of characters with category 
Nd, all from the same block of 10 consecutive code points, it will 
return the value they represent, assuming left-to-right positional 
notation, so that the right-most digit is the one's position, next is 
the 10's, etc.   It returns undef for any other string longer than one 
character.


More information about the Unicode mailing list