<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">Le 16/12/2020 à 18:34, Frédéric
Grosshans a écrit :<br>
</div>
<blockquote type="cite"
cite="mid:a356bb22-4cc4-e769-57b6-d6a33a947625@gmail.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<div class="moz-cite-prefix">Le 16/12/2020 à 14:47, Roger L
Costello via Unicode a écrit :<br>
</div>
<blockquote type="cite"
cite="mid:SA0PR09MB6907DE4090F17A22E382CF4CC8C50@SA0PR09MB6907.namprd09.prod.outlook.com">
<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"Nirmala UI";
panose-1:2 11 5 2 4 2 4 2 2 3;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin-top:0in;
margin-right:0in;
margin-bottom:8.0pt;
margin-left:0in;
line-height:106%;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:0in;
margin-bottom:8.0pt;
margin-left:.5in;
mso-add-space:auto;
line-height:106%;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
p.MsoListParagraphCxSpFirst, li.MsoListParagraphCxSpFirst, div.MsoListParagraphCxSpFirst
{mso-style-priority:34;
mso-style-type:export-only;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
mso-add-space:auto;
line-height:106%;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
p.MsoListParagraphCxSpMiddle, li.MsoListParagraphCxSpMiddle, div.MsoListParagraphCxSpMiddle
{mso-style-priority:34;
mso-style-type:export-only;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
mso-add-space:auto;
line-height:106%;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
p.MsoListParagraphCxSpLast, li.MsoListParagraphCxSpLast, div.MsoListParagraphCxSpLast
{mso-style-priority:34;
mso-style-type:export-only;
margin-top:0in;
margin-right:0in;
margin-bottom:8.0pt;
margin-left:.5in;
mso-add-space:auto;
line-height:106%;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}margin-bottom:0in;}
ul
{margin-bottom:0in;}</style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->Setting aside the Bengali/Oriya
problem I stress above, your critics should be addressed
somewhere else, since the Unicode standard is specifically
organized to make this possible and easy, down to variants of
this “hack”</blockquote>
</blockquote>
<p>Just a small complement: a function in Python which reads base 10
numbers encoded in Unicode (and fails if given something else,
including mixture of several digits) is as simple as the
following:</p>
<p><font face="monospace">Python 3.8.6 (default, Sep 25 2020,
09:36:53) </font><br>
</p>
<pre>᧚</pre>
<p><font face="monospace">[GCC 10.2.0] on linux<br>
Type "help", "copyright", "credits" or "license" for more
information.<br>
>>> def unicodeint(s):<br>
...
sdigitszero="0٠۰߀०০੦૦୦௦౦೦൦෦๐໐༠၀႐០᠐᥆᧐᪀᪐᭐᮰᱀᱐꘠꣐꤀꧐꧰꩐꯰0𐒠𐴰𑁦𑃰𑄶𑇐𑋰𑑐𑓐𑙐𑛀𑜰𑣠𑥐𑱐𑵐𑶠𖩠𖭐𝟎𝟘𝟢𝟬𝟶𞅀𞋰𞥐🯰"<br>
... #/!\ contains RTL characters, notably 𞥐 U+1E950 ADLAM
DIGIT ZERO towards the end<br>
... #Extracted from UnicodeData.txt for Unicode 13.0.0<br>
... ofirst=ord(s[0])<br>
... if ofirst > ord(sdigitszero[-1])+9 : raise ValueError
#1st char not a digit<br>
... for zx in sdigitszero:<br>
... z=ord(zx)<br>
... if ofirst<z : raise ValueError #1st char not a
digit<br>
... if z<=ofirst<=z+9 : break #z is the zero<br>
... z-=ord('0')<br>
... return int(''.join(chr(ord(c)-z) for c in s))<br>
... <br>
>>> unicodeint('৪২')<br>
42<br>
</font><br>
Of course, it‘s a quick hack, which probably should be optimized
and also should take special cases into account, notably ᧚<span
style="left: 60px; top: 772.473px; font-size: 16.6px;
font-family: sans-serif; transform: scaleX(0.829451);"> U+19DA </span><span
style="left: 60px; top: 772.473px; font-size: 16.6px;
font-family: sans-serif; transform: scaleX(0.829451);">NEW TAI
LUE THAM DIGIT ONE , Braille numbers, etc.</span> <font
face="monospace"><br>
</font></p>
<p>But the point is: Unicode indeed makes the parsing of many
base-10 numerals used for many scripts easy, since the small code
snippet above works for almost 50 scripts. Note that the hardcoded
string <font face="monospace">sdigitszero</font> is simply the
unicode characters of category <font face="monospace">Nd</font>
with <font face="monospace">Decimal_Digit_Value==0</font> and
nothing else.<font face="monospace"><br>
</font></p>
<p><font face="monospace"><br>
</font></p>
<p><font face="monospace"> Frédéric<br>
</font></p>
<p> <br>
<span style="left: 60px; top: 772.473px; font-size: 16.6px;
font-family: sans-serif; transform: scaleX(0.829451);"></span><span
style="left: 253.201px; top: 767.891px; font-size: 16.6px;
font-family: sans-serif; transform: scaleX(0.940292);"></span><span
style="left: 458.501px; top: 772.473px; font-size: 16.6px;
font-family: sans-serif; transform: scaleX(0.822329);"></span></p>
</body>
</html>