Different Bidirectional Character Types

Richard Wordingham richard.wordingham at ntlworld.com
Sat Jul 2 04:54:46 CDT 2022

On Sat, 2 Jul 2022 11:01:00 +0200
Hans Åberg via Unicode <unicode at corp.unicode.org> wrote:

> > On 1 Jul 2022, at 14:15, Andreas Prilop via Unicode
> > <unicode at corp.unicode.org> wrote:
> > 
> > Reference:
> > https://unicode.org/reports/tr9/#Bidirectional_Character_Types
> > 
> > Why do Hebrew letters and Arabic letters have different
> > bidirectional character types?  
> I cannot parse this, but in Hebrew, Arabic, and Persian, text is
> written RTL, but numbers LTR. For example, trying A123 in a
> translator supporting those scripts, I get: א123 أ ١٢٣
> ا ۱۲۳

For numbers, using natural language, you don't mean LTR, but 'with the
most significant digit on the left'.  It is a convention that the when
encoding 'four and twenty' using digits, the most significant digit is
stored first.  N'ko decimal numbers have the most significant digit on
the right, with the result that N'ko digits have bidi class
Right_To_Left, as do N'ko letters.

As to parsing the question, at the literal level Hebrew letters have
bidi class Right_To_Left (R) while Arabic letters have bidi class
Arabic_Letter (AL); Moroccan decimal digits (e.g U+0030) have bidi
class European_Number (EN), Egyptian decimal digits have bidi class
Arabic_Number (AN), Urdu decimal digits have bidi class European_Number
(EN) and Hindi decimal digits (e.g. U+0966) have bidi class
Left_to_Right (L).  When one throws dollar signs, which have bidi
class European_Terminator (ET) into the mix, these differences matter to
the bidi algorithm.


More information about the Unicode mailing list