Test Data Gone?
Steven R. Loomis
srl at icu-project.org
Mon Nov 16 08:32:49 CST 2015
Enviado desde nuestro iPhone.
> El 16 nov 2015, a las 4:44 AM, Mark Davis ☕️ <mark at macchiato.com> escribió:
>
> At the time we retracted it, it didn't appear that there was a lot of usage, and you really get a much more thorough test by comparing to ICU's implementation.
Right. An idea at IUC was rather than trying to scope test data as cldr conformance test data, to have a new effort that simply and explicitly records ICU's result for a certain Icu/cldr version somewhere for certain input values and certain formatting routines. People are doing this already, just combine efforts.
Maybe the results would be an Icu-maintained file instead of cldr, like a sample app.
>
> The data we previously had was mechanically generated from the data, not curated. It was created by generating concatenations of some chosen primary/secondary/tertiary characters together with the tailored+exemplar characters for each language.
>
> Mark
>
>> On Mon, Nov 16, 2015 at 11:00 AM, Martin J. Dürst <duerst at it.aoyama.ac.jp> wrote:
>>> On 2015/11/16 15:30, Mark Davis ☕️ wrote:
>>> Probably the most thorough test you could use would be one that tests
>>> semi-random strings to see if you get the same results as ICU.
>>
>> Good idea. For tailorings, one thing to do is to extract the characters used in the tailoring and to bias the semi-random strings heavily towards using these characters.
>>
>> Based on my experience with testing data for normalization (NFC and friends), I can say that having a good set of test data is extremely useful for implementers. I strongly encourage the Unicode Consortium to curate such data, and implementers at all levels to contribute to it.
>>
>> Regards, Martin.
>>
>>
>>
>>> On Nov 16, 2015 06:32, "Markus Scherer" <markus.icu at gmail.com> wrote:
>>>
>>>> On Sun, Nov 15, 2015 at 10:56 AM, Cameron Dutro <cameron at lumoslabs.com>
>>>> wrote:
>>>>
>>>>> Great, thanks Markus. Having these files is wonderful, and we're using
>>>>> them to test our implementation already. It is my understanding however
>>>>> that they do not test individual locale tailorings, is that correct?
>>>>
>>>> The UCA test file is only for the DUCET, corresponding to what we call the
>>>> "root locale". Actually, since CLDR tailors the default sort order, and ICU
>>>> implements that, CLDR has modified versions of those test files:
>>>> http://unicode.org/cldr/trac/browser/trunk/common/uca/
>>>>
>>>> The ICU test file has a number of test cases for various locales, as
>>>> indicated in the test data. They assume CLDR collation data. More often, I
>>>> tried to make minimal assumption about the collation data, and copied
>>>> relevant parts of rules into the test data -- so some of the test cases
>>>> require a from-rules builder. As a result, this file might be too specific
>>>> for other implementations.
>>>>
>>>> markus
>>>>
>>>> _______________________________________________
>>>> CLDR-Users mailing list
>>>> CLDR-Users at unicode.org
>>>> http://unicode.org/mailman/listinfo/cldr-users
>>>
>>>
>>>
>>> _______________________________________________
>>> CLDR-Users mailing list
>>> CLDR-Users at unicode.org
>>> http://unicode.org/mailman/listinfo/cldr-users
>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20151116/9a2293a0/attachment-0001.html>
More information about the CLDR-Users
mailing list