Is there a difference between converting a string of ASCII digits to an integer versus a string of non-ASCII digits to an integer?
richard.wordingham at ntlworld.com
Thu Dec 24 09:50:29 CST 2020
On Wed, 23 Dec 2020 16:59:59 -0700
Doug Ewell via Unicode <unicode at unicode.org> wrote:
> Richard Wordingham wrote:
> >> I suggest you double-check about the RTL digits (N'Ko & Adlam);
> >> please take a look at the relevant Unicode book chapters.
> > It looks as though the N'ko section documents the significance by
> > accident! I thought a policy was going to be documented (2012 or
> > slightly later) that decimal digits are stored most significant
> > digit first, but that doesn't seem to have happened.
> It happened for N’Ko anyway:
> “N’Ko uses decimal digits specific to the script. These digits have
> strong right-to-left directionality. Numbers are stored in text in
> logical order with most significant digit first; when displayed,
> numerals are then laid out in right-to-left order, with the most
> significant digit at the rightmost side, as illustrated for the
> numeral 144 in Figure 19-3. This situation differs from how numerals
> are handled in Hebrew and Arabic, where numerals are laid out in
> left-to-right order, even though the overall text direction is right
> to left.”
As you later noted, the third expresses not a policy, but a rule for
N'ko 'decimal digits'.
The last sentence is simply appalling:
1. Hebrew numerals are written with the most significant element on the
right. For Unicode, what is significant is that as the elements
are letters, they follow the normal presentation rule for sequences of
2. I would expect the components of Arabic letter numerals to follow
the same rules as when the elements are being used as letters. I can
find examples of both biggest first and smallest first.
3. The 'decimal digits' for Arabic 'five and twenty' are laid out in the
order sounded, i.e. the digit 5 is on the right and the digit 2 is on
the left. As with N'ko, the most significant digit is stored first.
More information about the Unicode