Test Data Gone?

Steven R. Loomis srl at icu-project.org
Mon Nov 16 08:32:49 CST 2015



Enviado desde nuestro iPhone.

> El 16 nov 2015, a las 4:44 AM, Mark Davis ☕️ <mark at macchiato.com> escribió:
> 
> At the time we retracted it, it didn't appear that there was a lot of usage, and you really get a much more thorough test by comparing to ICU's implementation.

Right. An idea at IUC was rather than trying to scope test data as cldr conformance test data, to have a new effort that simply and explicitly records ICU's result  for a certain Icu/cldr version somewhere for certain input values and certain formatting routines. People are doing this already, just combine efforts. 

Maybe the results would be an Icu-maintained file  instead of cldr, like a sample app. 

> 
> The data we previously had was mechanically generated from the data, not curated. It was created by generating concatenations of some chosen primary/secondary/tertiary characters together with the tailored+exemplar characters for each language. 
> 
> Mark
> 
>> On Mon, Nov 16, 2015 at 11:00 AM, Martin J. Dürst <duerst at it.aoyama.ac.jp> wrote:
>>> On 2015/11/16 15:30, Mark Davis ☕️ wrote:
>>> Probably the most thorough test you could use would be one that tests
>>> semi-random strings to see if you get the same results as ICU.
>> 
>> Good idea. For tailorings, one thing to do is to extract the characters used in the tailoring and to bias the semi-random strings heavily towards using these characters.
>> 
>> Based on my experience with testing data for normalization (NFC and friends), I can say that having a good set of test data is extremely useful for implementers. I strongly encourage the Unicode Consortium to curate such data, and implementers at all levels to contribute to it.
>> 
>> Regards,   Martin.
>> 
>> 
>> 
>>> On Nov 16, 2015 06:32, "Markus Scherer" <markus.icu at gmail.com> wrote:
>>> 
>>>> On Sun, Nov 15, 2015 at 10:56 AM, Cameron Dutro <cameron at lumoslabs.com>
>>>> wrote:
>>>> 
>>>>> Great, thanks Markus. Having these files is wonderful, and we're using
>>>>> them to test our implementation already. It is my understanding however
>>>>> that they do not test individual locale tailorings, is that correct?
>>>> 
>>>> The UCA test file is only for the DUCET, corresponding to what we call the
>>>> "root locale". Actually, since CLDR tailors the default sort order, and ICU
>>>> implements that, CLDR has modified versions of those test files:
>>>> http://unicode.org/cldr/trac/browser/trunk/common/uca/
>>>> 
>>>> The ICU test file has a number of test cases for various locales, as
>>>> indicated in the test data. They assume CLDR collation data. More often, I
>>>> tried to make minimal assumption about the collation data, and copied
>>>> relevant parts of rules into the test data -- so some of the test cases
>>>> require a from-rules builder. As a result, this file might be too specific
>>>> for other implementations.
>>>> 
>>>> markus
>>>> 
>>>> _______________________________________________
>>>> CLDR-Users mailing list
>>>> CLDR-Users at unicode.org
>>>> http://unicode.org/mailman/listinfo/cldr-users
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> CLDR-Users mailing list
>>> CLDR-Users at unicode.org
>>> http://unicode.org/mailman/listinfo/cldr-users
> 
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20151116/9a2293a0/attachment-0001.html>


More information about the CLDR-Users mailing list