Missing UAX#31 tests?

Sat Jul 14 10:51:16 CDT 2018

On 07/09/2018 02:11 PM, Karl Williamson via Unicode wrote:
> On 07/08/2018 03:21 AM, Mark Davis ☕️ wrote:
>> I'm surprised that the tests for 11.0 passed for a 10.0 
>> implementation, because the following should have triggered a 
>> difference for WB. Can you check on this particular case?
>>
>> ÷ 0020 × 0020 ÷#÷ [0.2] SPACE (WSegSpace) × [3.4] SPACE (WSegSpace) ÷ 
>> [0.3]
> 
> I'm one of the people who advocated for this change, and I had already 
> tailored our implementation of 10.0 to not break between horizontal 
> white space, so it's actually not surprising that this rule didn't break
>>

It turns out that the fault was all mine; the Unicode 11.0 tests were 
failing on a 10.0 implementation.  I'm sorry for starting this red 
herring thread.

If you care to know the details, read on.

The code that runs the tests knows what version of the UCD it is using, 
and it knows what version of the UAX boundary algorithms it is using. 
If these differ, it emits a warning about the discrepancy, and expects 
that there are going to be many test failures, so it marks all failing 
ones as 'To do' which suppresses their output, so as to not distract 
from any other failures that have been introduced by using the new UCD 
version.  (Updating the algorithm comes last.)

The solution for the future is to change the warning about the 
discrepancy to note that the failing boundary algorithm tests are 
suppressed.  This will clue me (or whoever) in that all is not 
necessarily well.

>>
>> About the testing:
>>
>> The tests are generated so that they go all the combinations of pairs, 
>> and some combinations of triples. The generated test cases use a 
>> sample from each partition of characters, to cut down on the file size 
>> to a reasonable level. That also means that some changes in the rules 
>> don't cause changes in the test results. Because it is not possible to 
>> test every combination, so there is also provision for additional test 
>> cases, such as those at the end of the files, eg:
>>
>> https://unicode.org/Public/11.0.0/ucd/auxiliary/WordBreakTest.html
>> https://unicode.org/Public/10.0.0/ucd/auxiliary/WordBreakTest.html
>>
>> We should extend those each time to make sure we cover combinations 
>> that aren't covered by pairs. There were some additions to that end; 
>> if they didn't cover enough cases, then we can look at your experience 
>> to add more.
>>
>> I can suggest two strategies for further testing:
>>
>> 1. To do a full test, for each row check every combinations obtained 
>> by replacing each sample character by every other character in its 
>> partition. Eg for the above line that would mean testing every 
>> <WSegSpace, WSegSpace> sequence.
>>
>> 2. Use a monkey test against ICU. That is, generate random 
>> combinations of characters from different partitions and check that 
>> ICU and your implementation are in sync.
>>
>> 3. During the beta period, test your previous-version with the new 
>> test files. If there are no failures, yet there are changes in the 
>> rules, then raise that issue during the beta period so we can add tests.
> 
> I actually did this, and as I recall, did find some test failures.  In 
> retrospect, I must have screwed up somehow back then.  I was under tight 
> deadline pressure, and as a result, did more cursory beta testing than 
> normal.
>>
>> 4. If possible, during the beta period upgrade your implementation and 
>> test against the new and old test files.
> 
>>
>> Anyone else have other suggestions for testing?
>>
>> Mark
>>
> 
> As an aside, a release or two ago, I implemented SB, and someone 
> immediately found a bug, and accused me of releasing software that had 
> not been tested at all.  He had looked through the test suite and not 
> found anything that looked like it was testing that.  But he failed to 
> find the test file which bundled up all your tests, in a manner he was 
> not accustomed to, so it was easy for him to overlook.  The bug only 
> manifested itself in longer runs of characters than your pairs and 
> triples tested.  I looked at it, and your SB tests still seemed 
> reasonable, and I should not expect a more complete series than you 
> furnished.
>>
>>
>> Mark
>> //////
>>
>> On Sun, Jul 8, 2018 at 6:52 AM, Karl Williamson via Unicode 
>> <unicode at unicode.org <mailto:unicode at unicode.org>> wrote:
>>
>>     I am working on upgrading from Unicode 10 to Unicode 11.
>>
>>     I used all the new files.
>>
>>     The algorithms for some of the boundaries, like GCB and WB, have
>>     changed so that some of the property values no longer have code
>>     points associated with them.
>>
>>     I ran the tests furnished in 11.0 for these boundaries, without
>>     having changed the algorithms from earlier releases.  All passed 
>> 100%.
>>
>>     Unless I'm missing something, that indicates that the tests
>>     furnished in 11.0 do not contain instances that exercise these
>>     changes.  My guess is that the 10.0 tests were also deficient.
>>
>>     I have been relying on the UCD to furnish tests that have enough
>>     coverage to sufficiently exercise the algorithms that are specified
>>     in UAX 31, but that appears to have been naive on my part
>>
>>
> 
> 
>