Re: Unicode is universal, so how come that universality doesn’t apply to digits?
Bill Poser
billposer2 at gmail.com
Wed Dec 16 15:32:48 CST 2020
It seems to me that, in spite of the superficial similarity of the way
numbers are written in many languages, this is NOT, in general, a matter of
encoding conversion or even transliteration but rather one of translation
and therefore not part of Unicode for the same reason that Unicode does not
handle the translation of text from, say, Japanese to English.
There is, actually, a library, which I have written, that handles
conversions between Unicode strings and integers for most systems of
writing numbers. (I have yet to update it to handle some of the more
recently encoded systems.) It is a C library which also has a TCL binding:
http://billposer.org/Software/libuninum.html
It handles a number of systems that require algorithms rather different
from that of atoi/strtol.
Bill
On Wed, Dec 16, 2020 at 12:04 PM Richard Wordingham via Unicode <
unicode at unicode.org> wrote:
> On Wed, 16 Dec 2020 18:34:55 +0100
> Frédéric Grosshans via Unicode <unicode at unicode.org> wrote:
>
> > It’s quite easy to make a lbrary which parses UniccodeData.txt
> > (version 13.0 here) and extract the digit ranges of the various
> > scripts and convert the various strings into number for the 50
> > scripts listed in table 22-3 of the standard plus the western digits
> > (Unicode 13.0 pdf here), it should be reasonably furureproof, in the
> > sense that parsing future unicode datafile should add stipts as they
> > are encoded. However, do not forget to check the exceptions in the
> > text around this table in in the relevant script pages: in Unicode
> > 13.0, it concerns Arabic, which has to sets of digits, Myanmar (3
> > sets), and Tai Tham (2 sets).
>
> Or just scan UnicodeData.txt for decimal digits with the value 0.
>
> Richard.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20201216/93f331ed/attachment.htm>
More information about the Unicode
mailing list