Re: Unicode is universal, so how come that universality doesn’t apply to digits?

Zach Lym indolering at gmail.com
Mon Dec 21 19:00:12 CST 2020


> I don't recall Roger saying anything about non-Latin variable names.

We agree that non-latin variable names are not the issue, I just
worded my response clumsily ¯\_(ツ)_/¯?

So ... why isn't the treatment of parsing numbers as good as variable
names?  Well, to cite Conway's Law,  "Any organization that designs a
system (defined broadly) will produce a design whose structure is a
copy of the organization's communication structure."

The identifier standard annex is ~30 pages of polished hand holding
for a language implementor: it provides examples, gets into parsing,
gives advice on customization, and explains tricky issues such as
handling zero-width-joiners.

I assume UAX 31 has received a disproportionate level of attention
thanks to hammering out DNS and URL standards, but maybe that's just
because I have a background in DNS.

>
> > The section on numbering (5.5) is only a page long and essentially
> > recommends handling decimal based numbering systems.  There isn't
> > nearly as much care given to this topic.
>
> Bengali and Oriya are decimal-based. (Whether they should be used
> together in a single number is another matter.) The first paragraph of
> Section 5.5 specifically discusses interpreting Devanagari digits as one
> would interpret Basic Latin digits. I don't know what needs to be added
> here.

As Frédéric points in his reply, section 22.3 has a lengthier
treatment (which I totally missed).  At a minimum, 5.5 should cross
reference 22.3.

> > There is a standard annex on mathematics, but that is in PDF form and
> > is largely concerned with parsing and display of mathematical
> > formulas.
>
> UTR #25 (a Technical Report, not a Standard Annex) does focus on Basic
> Latin digits, at one point (2.2) claiming that Basic Latin digits are
> essentially the only digits used in math, but it's true that the UTR is
> about math notation and that isn't really in scope here.

I think it's significant to answering Roger's question.  How much
demand is there for using native numeric literals when most
control-flow logic is going to be in English?

> The fact that the UTR is a PDF document doesn't seem pertinent.

PDFs do not rank well on Google, you can't deeply link to specific
sections, and they are generally a PITA to work with.  The Unicode
standard publishes PDFs *not* because it is a good idea, but because
it's inconvenient to change a 30-year-old publishing workflow.

> > However, as is the answer to most questions, it is a matter of time
> > and money. If someone is willing to spend the time expanding 5.5
> > writing a new annex, I am sure the Unicode committee would be happy to
> > review it.  Would you be interested in doing that legwork?
>
> Again, I don't see what is lacking in Section 5.5, especially
> considering its Devanagari example. The legwork that needs to be done is
> to make implementations more internationalized and more Unicode-aware.

Yes: it's ultimately on implementers and Unicode != i18n.

And: couldn't we do a better job at transitioning people to resources
on how to handle i18n in a more comprehensive fashion?

But also: Unicode is hella confusing, even to world-class programmers.
  Shouldn't we try to recruit suckers like Roger and I into making it
better?

ツ
-Zach Lym



More information about the Unicode mailing list