Missing UAX#31 tests?

Sat Jul 14 12:50:07 CDT 2018

Not to worry, these things happen to the best of us. Just glad the root of
the problem was found.

Mark

Mark

On Sat, Jul 14, 2018 at 5:51 PM, Karl Williamson <public at khwilliamson.com>
wrote:

> On 07/09/2018 02:11 PM, Karl Williamson via Unicode wrote:
>
>> On 07/08/2018 03:21 AM, Mark Davis ☕️ wrote:
>>
>>> I'm surprised that the tests for 11.0 passed for a 10.0 implementation,
>>> because the following should have triggered a difference for WB. Can you
>>> check on this particular case?
>>>
>>> ÷ 0020 × 0020 ÷#÷ [0.2] SPACE (WSegSpace) × [3.4] SPACE (WSegSpace) ÷
>>> [0.3]
>>>
>>
>> I'm one of the people who advocated for this change, and I had already
>> tailored our implementation of 10.0 to not break between horizontal white
>> space, so it's actually not surprising that this rule didn't break
>>
>>>
>>>
> It turns out that the fault was all mine; the Unicode 11.0 tests were
> failing on a 10.0 implementation.  I'm sorry for starting this red herring
> thread.
>
> If you care to know the details, read on.
>
> The code that runs the tests knows what version of the UCD it is using,
> and it knows what version of the UAX boundary algorithms it is using. If
> these differ, it emits a warning about the discrepancy, and expects that
> there are going to be many test failures, so it marks all failing ones as
> 'To do' which suppresses their output, so as to not distract from any other
> failures that have been introduced by using the new UCD version.  (Updating
> the algorithm comes last.)
>
> The solution for the future is to change the warning about the discrepancy
> to note that the failing boundary algorithm tests are suppressed.  This
> will clue me (or whoever) in that all is not necessarily well.
>
>
>
>>> About the testing:
>>>
>>> The tests are generated so that they go all the combinations of pairs,
>>> and some combinations of triples. The generated test cases use a sample
>>> from each partition of characters, to cut down on the file size to a
>>> reasonable level. That also means that some changes in the rules don't
>>> cause changes in the test results. Because it is not possible to test every
>>> combination, so there is also provision for additional test cases, such as
>>> those at the end of the files, eg:
>>>
>>> https://unicode.org/Public/11.0.0/ucd/auxiliary/WordBreakTest.html
>>> https://unicode.org/Public/10.0.0/ucd/auxiliary/WordBreakTest.html
>>>
>>> We should extend those each time to make sure we cover combinations that
>>> aren't covered by pairs. There were some additions to that end; if they
>>> didn't cover enough cases, then we can look at your experience to add more.
>>>
>>> I can suggest two strategies for further testing:
>>>
>>> 1. To do a full test, for each row check every combinations obtained by
>>> replacing each sample character by every other character in its
>>> partition. Eg for the above line that would mean testing every <WSegSpace,
>>> WSegSpace> sequence.
>>>
>>> 2. Use a monkey test against ICU. That is, generate random combinations
>>> of characters from different partitions and check that ICU and your
>>> implementation are in sync.
>>>
>>> 3. During the beta period, test your previous-version with the new test
>>> files. If there are no failures, yet there are changes in the rules, then
>>> raise that issue during the beta period so we can add tests.
>>>
>>
>> I actually did this, and as I recall, did find some test failures.  In
>> retrospect, I must have screwed up somehow back then.  I was under tight
>> deadline pressure, and as a result, did more cursory beta testing than
>> normal.
>>
>>>
>>> 4. If possible, during the beta period upgrade your implementation and
>>> test against the new and old test files.
>>>
>>
>>
>>> Anyone else have other suggestions for testing?
>>>
>>> Mark
>>>
>>>
>> As an aside, a release or two ago, I implemented SB, and someone
>> immediately found a bug, and accused me of releasing software that had not
>> been tested at all.  He had looked through the test suite and not found
>> anything that looked like it was testing that.  But he failed to find the
>> test file which bundled up all your tests, in a manner he was not
>> accustomed to, so it was easy for him to overlook.  The bug only manifested
>> itself in longer runs of characters than your pairs and triples tested.  I
>> looked at it, and your SB tests still seemed reasonable, and I should not
>> expect a more complete series than you furnished.
>>
>>>
>>>
>>> Mark
>>> //////
>>>
>>> On Sun, Jul 8, 2018 at 6:52 AM, Karl Williamson via Unicode <
>>> unicode at unicode.org <mailto:unicode at unicode.org>> wrote:
>>>
>>>     I am working on upgrading from Unicode 10 to Unicode 11.
>>>
>>>     I used all the new files.
>>>
>>>     The algorithms for some of the boundaries, like GCB and WB, have
>>>     changed so that some of the property values no longer have code
>>>     points associated with them.
>>>
>>>     I ran the tests furnished in 11.0 for these boundaries, without
>>>     having changed the algorithms from earlier releases.  All passed
>>> 100%.
>>>
>>>     Unless I'm missing something, that indicates that the tests
>>>     furnished in 11.0 do not contain instances that exercise these
>>>     changes.  My guess is that the 10.0 tests were also deficient.
>>>
>>>     I have been relying on the UCD to furnish tests that have enough
>>>     coverage to sufficiently exercise the algorithms that are specified
>>>     in UAX 31, but that appears to have been naive on my part
>>>
>>>
>>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180714/9d63965d/attachment.html>