Missing UAX#31 tests?

Mark Davis ☕️ via Unicode unicode at unicode.org
Sun Jul 8 04:23:15 CDT 2018


PS, although the title was "Missing UAX#31 tests?", I assumed you were
talking about http://unicode.org/reports/tr29/

Mark

On Sun, Jul 8, 2018 at 11:21 AM, Mark Davis ☕️ <mark at macchiato.com> wrote:

> I'm surprised that the tests for 11.0 passed for a 10.0 implementation,
> because the following should have triggered a difference for WB. Can you
> check on this particular case?
>
> ÷ 0020 × 0020 ÷ #  ÷ [0.2] SPACE (WSegSpace) × [3.4] SPACE (WSegSpace) ÷
> [0.3]
>
> About the testing:
>
> The tests are generated so that they go all the combinations of pairs, and
> some combinations of triples. The generated test cases use a sample from
> each partition of characters, to cut down on the file size to a reasonable
> level. That also means that some changes in the rules don't cause changes
> in the test results. Because it is not possible to test every
> combination, so there is also provision for additional test cases, such as
> those at the end of the files, eg:
>
> https://unicode.org/Public/11.0.0/ucd/auxiliary/WordBreakTest.html
> https://unicode.org/Public/10.0.0/ucd/auxiliary/WordBreakTest.html
>
> We should extend those each time to make sure we cover combinations that
> aren't covered by pairs. There were some additions to that end; if they
> didn't cover enough cases, then we can look at your experience to add more.
>
> I can suggest two strategies for further testing:
>
> 1. To do a full test, for each row check every combinations obtained by
> replacing each sample character by every other character in its
> partition. Eg for the above line that would mean testing every <WSegSpace,
> WSegSpace> sequence.
>
> 2. Use a monkey test against ICU. That is, generate random combinations of
> characters from different partitions and check that ICU and your
> implementation are in sync.
>
> 3. During the beta period, test your previous-version with the new test
> files. If there are no failures, yet there are changes in the rules, then
> raise that issue during the beta period so we can add tests.
>
> 4. If possible, during the beta period upgrade your implementation and
> test against the new and old test files.
>
> Anyone else have other suggestions for testing?
>
> Mark
>
>
>
>
> Mark
>
> On Sun, Jul 8, 2018 at 6:52 AM, Karl Williamson via Unicode <
> unicode at unicode.org> wrote:
>
>> I am working on upgrading from Unicode 10 to Unicode 11.
>>
>> I used all the new files.
>>
>> The algorithms for some of the boundaries, like GCB and WB, have changed
>> so that some of the property values no longer have code points associated
>> with them.
>>
>> I ran the tests furnished in 11.0 for these boundaries, without having
>> changed the algorithms from earlier releases.  All passed 100%.
>>
>> Unless I'm missing something, that indicates that the tests furnished in
>> 11.0 do not contain instances that exercise these changes.  My guess is
>> that the 10.0 tests were also deficient.
>>
>> I have been relying on the UCD to furnish tests that have enough coverage
>> to sufficiently exercise the algorithms that are specified in UAX 31, but
>> that appears to have been naive on my part
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180708/a03d224d/attachment.html>


More information about the Unicode mailing list