Unicode is universal, so how come that universality doesn’t apply to digits?

Doug Ewell doug at ewellic.org
Sun Dec 20 15:40:14 CST 2020

Zach Lym wrote:
> I don't think it's fair to dismiss this as "not a unicode problem."
> As the OP pointed out, support for non-latin variable names is largely
> due to Unicode's identity standard and extensive implementation
> advice.
I don't recall Roger saying anything about non-Latin variable names. He
> Why, for example, can’t a Bengali-speaking person create XML such as
> this:
> <সংখ্যা_ছাত্র>৪୨</সংখ্যা_ছাত্র>
> or write a program assignment statement like this:
>             সংখ্যা_ছাত্র = ৪୨
This doesn't claim that the Bengali variable name
সংখ্যা_ছাত্র is not supported, but rather the
mixed Bengali/Oriya constant ৪୨. In fact, a few lines earlier Roger
> a Bengali-speaking person can write this:
>              সংখ্যা_ছাত্র = 42
so variable names aren't the issue.
> The section on numbering (5.5) is only a page long and essentially
> recommends handling decimal based numbering systems.  There isn't
> nearly as much care given to this topic.
Bengali and Oriya are decimal-based. (Whether they should be used
together in a single number is another matter.) The first paragraph of
Section 5.5 specifically discusses interpreting Devanagari digits as one
would interpret Basic Latin digits. I don't know what needs to be added
> There is a standard annex on mathematics, but that is in PDF form and
> is largely concerned with parsing and display of mathematical
> formulas.
UTR #25 (a Technical Report, not a Standard Annex) does focus on Basic
Latin digits, at one point (2.2) claiming that Basic Latin digits are
essentially the only digits used in math, but it's true that the UTR is
about math notation and that isn't really in scope here. The fact that
the UTR is a PDF document doesn't seem pertinent.
> However, as is the answer to most questions, it is a matter of time
> and money. If someone is willing to spend the time expanding 5.5
> writing a new annex, I am sure the Unicode committee would be happy to
> review it.  Would you be interested in doing that legwork?
Again, I don't see what is lacking in Section 5.5, especially
considering its Devanagari example. The legwork that needs to be done is
to make implementations more internationalized and more Unicode-aware.
Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org

More information about the Unicode mailing list