IdnaTest.txt and RFC 5893

Mark Davis ☕️ mark at
Thu Jan 5 09:55:47 CST 2017

Alastair, thanks for finding it and bringing it up. I think you're right
that the problem is in that the test generation code doesn't properly apply
the bidi criteria to *all* the labels if *any* of the labels are RTL, but
instead is probably just going on a label-by-label basis. Thankfully, it
looks like ICU does handle it right, by your note. (The test file
generation doesn't use the ICU code.)

Could you please report this via so
that we make sure that it is tracked and brought up to the UTC?



On Thu, Jan 5, 2017 at 10:46 AM, Alastair Houghton <
alastair at> wrote:

> On 4 Jan 2017, at 23:40, Markus Scherer < at> wrote:
> >
> > On Wed, Jan 4, 2017 at 2:28 AM, Alastair Houghton <
> alastair at> wrote:
> > RFC 5893 seems pretty clear to me, and the problem really is that the
> test vectors (which come from seem (to me) to be incorrect.
> >
> > says "The following rule,
> consisting of six conditions, applies to labels in Bidi domain names."
> >
> > That's what the ICU code does -- applying the rule to each label -- and
> I assume that's the basis for the test data.
> Absolutely.  But the crucial part is “in Bidi domain names”.  That is, it
> applies to *all* labels that are part of a Bidi domain name, not just RTL
> labels.  It did not say “applies to RTL labels in Bidi domain names” and in
> fact even explicitly states that (in the first bullet point at the end of
> section 2):
>   ...Note that even LTR labels and pure ASCII labels have to be tested.
> Not to mention the fact that parts 5 and 6 of the rule apply specifically
> to LTR labels.
> So it’s quite clear that given the domain name “0à.א”, both “א” *and* “0à”
> need to be checked using the Bidi Rule.  Unless someone can explain why
> “0à” does not fail the test, surely we all agree that line 74 is incorrect:
> > B;    0à.\u05D0;      ;       xn--0-sfa.xn--4db       #       0à.א
> and similarly with line 93:
> > B;    àˇ.\u05D0;      ;       xn--0ca88g.xn--4db      #       àˇ.א
> > ICU does not currently check for multi-label bidi combinations.
> I was a bit puzzled by this, because the code clearly does (both in the
> C++ and Java versions) and yet the online demo doesn’t appear to object to
> the above test cases.  So I wrote a quick test program against the C++
> version of ICU 58.2 and fed it both test cases, and, sure enough, ICU
> agrees that there is a BiDi error in both of the above cases.
> Kind regards,
> Alastair.
> --
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Unicode mailing list