Unicode is universal, so how come that universality doesn’t apply to digits?

Richard Wordingham richard.wordingham at ntlworld.com
Wed Dec 16 13:23:09 CST 2020

On Wed, 16 Dec 2020 16:02:00 +0000 (GMT)
William_J_G Overington via Unicode <unicode at unicode.org> wrote:

> Hi
> Well, is the way to make progress that Unicode Inc. could make
> available a pseudo-code algorithm that can be converted to various
> programming languages that is such that the way that a digit is
> derived from the text characters is an algorithm with a structure of
> the form
> if (digit_character >= 'A') AND (digit_character <= 'B') then 
> digit_number := digit_character - 'C'
> elsif (digit_character >= 'D') AND (digit_character <= 'E') then 
> digit_number := digit_character - 'F'
> elsif ...

It looks to me as though some versions of wcstol() already accept a
sequence of decimal digits.  C-11 allows such behaviour.  The simple
algorithm sketched here won't work for 8-bit char - ISCII Indian digits
and TIS-620 Thai digits overlap but do not coincide.  Thus for strtol(),
you would need to include the locale.

As Frédéric Grosshans has noticed, there is also the issue of
digit sequences spoofing, besides variations of the letter 'O' being
harmful.  Not every call of strtol() parsing a digit string actually
checks that the offered string is in the form of a number.


More information about the Unicode mailing list