<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">In looking at this question,
identifiers are useful to consider, if for a different reason than
the one given below for variable names.</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">For domain names, for example, many
script communities support both ASCII digits and native ones, some
only support ASCII digits, even if a native system exists. (See
<a class="moz-txt-link-freetext" href="https://icann.org/idn">https://icann.org/idn</a> and look for Second Level Reference LGRs)<br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">That seems to reflect an idea that
there are usage domains (note, "domain" used in a separate sense
here from "domain") where native digits would seem to be out of
place. In terms of programming languages, it's not a leap to treat
numerical expressions as a form of mathematical notation - which
then would inherit the bias towards ASCII digits.</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">Database fields (or their serialization
into XML) are a different matter. You would expect users to be
able to enter strings based on the definition of the field. If the
field type is "text", "string" etc. you would expect to be able to
enter any number string, up to and including mixed-script
numerics.<br>
<br>
If the field type is "domain name" then certain strings might not
be valid in certain zones. Just as mixed-script labels are
discouraged.<br>
<br>
Now, if the field type is "decimal number" it would depend on the
consumer whether this is intended for "mathematical" use, or
whether to support any number system. <br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">Unicode makes the latter possible by
guaranteeing that all decimal number systems (that are so marked)
can be parsed using the same "subtract zero" method for each
digit. That's where Unicode's responsibility ends (from the point
of the Unicode Standard). The CLDR locale data repository (also
maintained by the Unicode Consortium) may have additional data for
number parsing and formatting.</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">Those data may not support parsing or
formatting arbitrary mixed-script digit combinations. That is also
OK, because the data is geared towards getting the ordinary use of
numbers correct for as many locales and languages, not to deal
with fancyful stuff that doesn't have a real-life user community
using it in daily life.</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">If you like such playful stuff, you are
welcome, but on your own. And that is as it should be.<br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">A./<br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">On 12/20/2020 1:40 PM, Doug Ewell via
Unicode wrote:<br>
</div>
<blockquote type="cite"
cite="mid:20201220144014.665a7a7059d7ee80bb4d670165c8327d.10caafc26a.wbe@email15.godaddy.com">
<pre class="moz-quote-pre" wrap="">Zach Lym wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">I don't think it's fair to dismiss this as "not a unicode problem."
As the OP pointed out, support for non-latin variable names is largely
due to Unicode's identity standard and extensive implementation
advice.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
I don't recall Roger saying anything about non-Latin variable names. He
wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Why, for example, can’t a Bengali-speaking person create XML such as
this:
<সংখ্যা_ছাত্র>৪୨</সংখ্যা_ছাত্র>
or write a program assignment statement like this:
সংখ্যা_ছাত্র = ৪୨
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
This doesn't claim that the Bengali variable name
সংখ্যা_ছাত্র is not supported, but rather the
mixed Bengali/Oriya constant ৪୨. In fact, a few lines earlier Roger
wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">a Bengali-speaking person can write this:
সংখ্যা_ছাত্র = 42
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
so variable names aren't the issue.
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">The section on numbering (5.5) is only a page long and essentially
recommends handling decimal based numbering systems. There isn't
nearly as much care given to this topic.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Bengali and Oriya are decimal-based. (Whether they should be used
together in a single number is another matter.) The first paragraph of
Section 5.5 specifically discusses interpreting Devanagari digits as one
would interpret Basic Latin digits. I don't know what needs to be added
here.
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">There is a standard annex on mathematics, but that is in PDF form and
is largely concerned with parsing and display of mathematical
formulas.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
UTR #25 (a Technical Report, not a Standard Annex) does focus on Basic
Latin digits, at one point (2.2) claiming that Basic Latin digits are
essentially the only digits used in math, but it's true that the UTR is
about math notation and that isn't really in scope here. The fact that
the UTR is a PDF document doesn't seem pertinent.
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">However, as is the answer to most questions, it is a matter of time
and money. If someone is willing to spend the time expanding 5.5
writing a new annex, I am sure the Unicode committee would be happy to
review it. Would you be interested in doing that legwork?
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Again, I don't see what is lacking in Section 5.5, especially
considering its Devanagari example. The legwork that needs to be done is
to make implementations more internationalized and more Unicode-aware.
--
Doug Ewell, CC, ALB | Thornton, CO, US | ewellic.org
</pre>
</blockquote>
<p><br>
</p>
</body>
</html>