IdnaTest.txt and RFC 5893

Mark Davis ☕️ mark at macchiato.com
Thu Jan 5 09:55:47 CST 2017


Alastair, thanks for finding it and bringing it up. I think you're right
that the problem is in that the test generation code doesn't properly apply
the bidi criteria to *all* the labels if *any* of the labels are RTL, but
instead is probably just going on a label-by-label basis. Thankfully, it
looks like ICU does handle it right, by your note. (The test file
generation doesn't use the ICU code.)

Could you please report this via http://www.unicode.org/reporting.html so
that we make sure that it is tracked and brought up to the UTC?

Mark



Mark

On Thu, Jan 5, 2017 at 10:46 AM, Alastair Houghton <
alastair at alastairs-place.net> wrote:

> On 4 Jan 2017, at 23:40, Markus Scherer <markus.icu at gmail.com> wrote:
> >
> > On Wed, Jan 4, 2017 at 2:28 AM, Alastair Houghton <
> alastair at alastairs-place.net> wrote:
> > RFC 5893 seems pretty clear to me, and the problem really is that the
> test vectors (which come from unicode.org) seem (to me) to be incorrect.
> >
> > https://tools.ietf.org/html/rfc5893#section-2 says "The following rule,
> consisting of six conditions, applies to labels in Bidi domain names."
> >
> > That's what the ICU code does -- applying the rule to each label -- and
> I assume that's the basis for the test data.
>
> Absolutely.  But the crucial part is “in Bidi domain names”.  That is, it
> applies to *all* labels that are part of a Bidi domain name, not just RTL
> labels.  It did not say “applies to RTL labels in Bidi domain names” and in
> fact even explicitly states that (in the first bullet point at the end of
> section 2):
>
>   ...Note that even LTR labels and pure ASCII labels have to be tested.
>
> Not to mention the fact that parts 5 and 6 of the rule apply specifically
> to LTR labels.
>
> So it’s quite clear that given the domain name “0à.א”, both “א” *and* “0à”
> need to be checked using the Bidi Rule.  Unless someone can explain why
> “0à” does not fail the test, surely we all agree that line 74 is incorrect:
>
> > B;    0à.\u05D0;      ;       xn--0-sfa.xn--4db       #       0à.א
>
> and similarly with line 93:
>
> > B;    àˇ.\u05D0;      ;       xn--0ca88g.xn--4db      #       àˇ.א
>
> > ICU does not currently check for multi-label bidi combinations.
>
> I was a bit puzzled by this, because the code clearly does (both in the
> C++ and Java versions) and yet the online demo doesn’t appear to object to
> the above test cases.  So I wrote a quick test program against the C++
> version of ICU 58.2 and fed it both test cases, and, sure enough, ICU
> agrees that there is a BiDi error in both of the above cases.
>
> Kind regards,
>
> Alastair.
>
> --
> http://alastairs-place.net
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20170105/eddf575a/attachment.html>


More information about the Unicode mailing list