Different Bidirectional Character Types
eliz at gnu.org
Sat Jul 2 05:13:53 CDT 2022
> Date: Sat, 2 Jul 2022 10:54:46 +0100
> From: Richard Wordingham via Unicode <unicode at corp.unicode.org>
> On Sat, 2 Jul 2022 11:01:00 +0200
> Hans Åberg via Unicode <unicode at corp.unicode.org> wrote:
> > > On 1 Jul 2022, at 14:15, Andreas Prilop via Unicode
> > > <unicode at corp.unicode.org> wrote:
> > >
> > > Reference:
> > > https://unicode.org/reports/tr9/#Bidirectional_Character_Types
> > >
> > > Why do Hebrew letters and Arabic letters have different
> > > bidirectional character types?
> > I cannot parse this, but in Hebrew, Arabic, and Persian, text is
> > written RTL, but numbers LTR. For example, trying A123 in a
> > translator supporting those scripts, I get: א123 أ ١٢٣
> > ا ۱۲۳
> For numbers, using natural language, you don't mean LTR, but 'with the
> most significant digit on the left'. It is a convention that the when
> encoding 'four and twenty' using digits, the most significant digit is
> stored first. N'ko decimal numbers have the most significant digit on
> the right, with the result that N'ko digits have bidi class
> Right_To_Left, as do N'ko letters.
> As to parsing the question, at the literal level Hebrew letters have
> bidi class Right_To_Left (R) while Arabic letters have bidi class
> Arabic_Letter (AL); Moroccan decimal digits (e.g U+0030) have bidi
> class European_Number (EN), Egyptian decimal digits have bidi class
> Arabic_Number (AN), Urdu decimal digits have bidi class European_Number
> (EN) and Hindi decimal digits (e.g. U+0966) have bidi class
> Left_to_Right (L). When one throws dollar signs, which have bidi
> class European_Terminator (ET) into the mix, these differences matter to
> the bidi algorithm.
I think a simpler answer is that Arabic letters (bidi class AL) in
some cases make European Numbers (EN) behave like Arabic Numbers (AN);
see rule W2 of UAX#9. And Arabic Numbers then affect how other "weak"
characters are reordered, see W6.
IOW, these distinctions are needed to produce the expected reordered
order in each case.
More information about the Unicode