Missing UAX#31 tests?
Mark Davis ☕️ via Unicode
unicode at unicode.org
Sun Jul 8 04:23:15 CDT 2018
PS, although the title was "Missing UAX#31 tests?", I assumed you were
talking about http://unicode.org/reports/tr29/
On Sun, Jul 8, 2018 at 11:21 AM, Mark Davis ☕️ <mark at macchiato.com> wrote:
> I'm surprised that the tests for 11.0 passed for a 10.0 implementation,
> because the following should have triggered a difference for WB. Can you
> check on this particular case?
> ÷ 0020 × 0020 ÷ # ÷ [0.2] SPACE (WSegSpace) × [3.4] SPACE (WSegSpace) ÷
> About the testing:
> The tests are generated so that they go all the combinations of pairs, and
> some combinations of triples. The generated test cases use a sample from
> each partition of characters, to cut down on the file size to a reasonable
> level. That also means that some changes in the rules don't cause changes
> in the test results. Because it is not possible to test every
> combination, so there is also provision for additional test cases, such as
> those at the end of the files, eg:
> We should extend those each time to make sure we cover combinations that
> aren't covered by pairs. There were some additions to that end; if they
> didn't cover enough cases, then we can look at your experience to add more.
> I can suggest two strategies for further testing:
> 1. To do a full test, for each row check every combinations obtained by
> replacing each sample character by every other character in its
> partition. Eg for the above line that would mean testing every <WSegSpace,
> WSegSpace> sequence.
> 2. Use a monkey test against ICU. That is, generate random combinations of
> characters from different partitions and check that ICU and your
> implementation are in sync.
> 3. During the beta period, test your previous-version with the new test
> files. If there are no failures, yet there are changes in the rules, then
> raise that issue during the beta period so we can add tests.
> 4. If possible, during the beta period upgrade your implementation and
> test against the new and old test files.
> Anyone else have other suggestions for testing?
> On Sun, Jul 8, 2018 at 6:52 AM, Karl Williamson via Unicode <
> unicode at unicode.org> wrote:
>> I am working on upgrading from Unicode 10 to Unicode 11.
>> I used all the new files.
>> The algorithms for some of the boundaries, like GCB and WB, have changed
>> so that some of the property values no longer have code points associated
>> with them.
>> I ran the tests furnished in 11.0 for these boundaries, without having
>> changed the algorithms from earlier releases. All passed 100%.
>> Unless I'm missing something, that indicates that the tests furnished in
>> 11.0 do not contain instances that exercise these changes. My guess is
>> that the 10.0 tests were also deficient.
>> I have been relying on the UCD to furnish tests that have enough coverage
>> to sufficiently exercise the algorithms that are specified in UAX 31, but
>> that appears to have been naive on my part
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode