From cldr-users at unicode.org Mon Jan 8 19:40:14 2018 From: cldr-users at unicode.org (Zach Laine via CLDR-Users) Date: Mon, 8 Jan 2018 19:40:14 -0600 Subject: Are CollationTest_CLDR_*.txt appropriate for testing FractionalUCA.txt-derived CETs? Message-ID: It seems like this should be the case, but I see no where that explicitly states this. If its not the case, are there more appropriate test cases available somewhere else? A bit more context: I've generated tests from CollationTest_CLDR_*.txt, and they all passed using my collation code and using data from allkeys_CLDR.txt. When I changed to using data from FractionalUCA.txt, many of those tests now fail. I have not reason to suspect anyone but myself. :) I'm just checking here to make certain that I'm using the right set of tests. Thanks, Zach -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Mon Jan 8 22:07:26 2018 From: cldr-users at unicode.org (Markus Scherer via CLDR-Users) Date: Mon, 8 Jan 2018 20:07:26 -0800 Subject: Are CollationTest_CLDR_*.txt appropriate for testing FractionalUCA.txt-derived CETs? In-Reply-To: References: Message-ID: Yes. ICU implements collation with FractionalUCA.txt for the root data, and tests with these files. markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Tue Jan 9 00:16:38 2018 From: cldr-users at unicode.org (Zach Laine via CLDR-Users) Date: Tue, 9 Jan 2018 00:16:38 -0600 Subject: Are CollationTest_CLDR_*.txt appropriate for testing FractionalUCA.txt-derived CETs? In-Reply-To: References:

Message-ID: Thanks! Good to have that confirmed. Zach On Mon, Jan 8, 2018 at 10:07 PM, Markus Scherer wrote: > Yes. > > ICU implements collation with FractionalUCA.txt for the root data, and > tests with these files. > > markus > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Wed Jan 10 18:15:30 2018 From: cldr-users at unicode.org (Richard Wordingham via CLDR-Users) Date: Thu, 11 Jan 2018 00:15:30 +0000 Subject: Vietnamese Collation Wrong for Polysyllables Message-ID: <20180111001530.5b1cade1@JRWUBU2> Is it in order for me to raise a ticket to report that the CLDR Vietnamese collation is wrong for polysyllabic words? For example, it sorts 'Argentina/e' before 'Afghan(istan)', where as comes before on p1 of the 2016 edition of 'Tuttle Compact Vietnamese Dictionary: Vietnamese-English English-Vietnamese'. The dictionary looks right - except that it has transposed the order of acute and grave accents! I know exactly what is wrong for this example - the final paragraph of https://en.wikipedia.org/wiki/Vietnamese_alphabet#Tone_marks explains how Vietnamese collation works with the tone marks. The key message is, "Ordering according to primary and secondary differences proceeds syllable by syllable". Thus and have a primary difference in the two country names. I have a good idea of how to fix the problem, but I don't have time to work out the details this month, which might be needed for a ticket. There is one formal problem with the solution I have in mind. It involves collating elements such as to swap the tone mark (which really has primary weight) and final consonant, and the problem is that the FCD closure of a collation with such elements is infinite - it has to include such generated collating elements as . I am also assuming that syllable boundaries are always marked in words with tone marks. Any revision of the CLDR definition should be checked against a Vietnamese dictionary - according to https://bugzilla.redhat.com/show_bug.cgi?id=516467, Nguyen Thai Ngoc Duy seems to have done the donkey work by providing http://repo.or.cz/w/words-vi.git. Richard. From cldr-users at unicode.org Thu Jan 11 03:03:31 2018 From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users) Date: Thu, 11 Jan 2018 10:03:31 +0100 Subject: Vietnamese Collation Wrong for Polysyllables In-Reply-To: <20180111001530.5b1cade1@JRWUBU2> References: <20180111001530.5b1cade1@JRWUBU2> Message-ID: You raise a good issue. The first question I'd have is whether the lexical ordering described in https://en.wikipedia.org/wiki/Vietnamese_alphabet#Tone_marks is expected by average Vietnamese. We have seen before cases where a formal government specification (French accent ordering) is expected by nobody outside of a small group of mavens. Assuming it is required ... This may just be a case where the UCA doesn't work well enough without preprocessing. The standard does allow for such preprocessing, and the question is how to allow for that in CLDR data. One way I can think of by allowing a transform for text that is applied before sorting to be specified in the UCA description. For speed, implementations would probably do that in code, but we could have a data representation that could be used in a reference implementation. It appears that Vietnamese syllables are well structured, which could allow a relatively simple transform to do the job, along the lines of what you are suggesting. Even in code, it would still probably make the sorting considerably slower than for other languages, so we might want to offer two variants for sorting. In XML, something like the following (X,Y are just for illustration): Another question I'd have is whether there are any changes to the CLDR rules for Vietnamese that would make the ordering "closer" to what is required, without such a transform or a gazillion collation rules. For example, would making the tone-marks primary differences produce a result that is closer? In any event, we'd want to involve Vietnamese experts before going any further. Mark On Thu, Jan 11, 2018 at 1:15 AM, Richard Wordingham via CLDR-Users < cldr-users at unicode.org> wrote: > Is it in order for me to raise a ticket to report that the CLDR > Vietnamese collation is wrong for polysyllabic words? For example, it > sorts 'Argentina/e' before 'Afghan(istan)', > where as comes before on p1 of the 2016 > edition of 'Tuttle Compact Vietnamese Dictionary: Vietnamese-English > English-Vietnamese'. The dictionary looks right - except that it has > transposed the order of acute and grave accents! > > I know exactly what is wrong for this example - the final paragraph of > https://en.wikipedia.org/wiki/Vietnamese_alphabet#Tone_marks explains > how Vietnamese collation works with the tone marks. The key message > is, "Ordering according to primary and secondary differences proceeds > syllable by syllable". Thus and have a primary difference in > the two country names. I have a good idea of how to fix the problem, > but I don't have time to work out the details this month, which might > be needed for a ticket. > > There is one formal problem with the solution I have in mind. It > involves collating elements such as to swap the tone mark > (which really has primary weight) and final consonant, and the problem > is that the FCD closure of a collation with such elements is infinite - > it has to include such generated collating elements as ?, n>. > > I am also assuming that syllable boundaries are always marked > in words with tone marks. Any revision of the CLDR definition should > be checked against a Vietnamese dictionary - according to > https://bugzilla.redhat.com/show_bug.cgi?id=516467, Nguyen Thai Ngoc Duy > seems to have done the donkey work by providing > http://repo.or.cz/w/words-vi.git. > > Richard. > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Thu Jan 11 14:51:51 2018 From: cldr-users at unicode.org (Richard Wordingham via CLDR-Users) Date: Thu, 11 Jan 2018 20:51:51 +0000 Subject: Vietnamese Collation Wrong for Polysyllables In-Reply-To: References: <20180111001530.5b1cade1@JRWUBU2> Message-ID: <20180111205151.0a638f04@JRWUBU2> On Thu, 11 Jan 2018 10:03:31 +0100 Mark Davis ?? via CLDR-Users wrote: > The first question I'd have is whether the lexical ordering described > in https://en.wikipedia.org/wiki/Vietnamese_alphabet#Tone_marks is > expected by average Vietnamese. We have seen before cases where a > formal government specification (French accent ordering) is expected > by nobody outside of a small group of mavens. How small is small? The official Thai way seems not to be how Thais usually compare manually! > Assuming it is required ... > > This may just be a case where the UCA doesn't work well enough without > preprocessing. The standard does allow for such preprocessing, and the > question is how to allow for that in CLDR data. One way I can think > of by allowing a transform for text that is applied before sorting to > be specified in the UCA description. That method may assist greatly with other mainland SE Asian languages that compare syllable by syllable - Lao, Tai Lue and Burmese at least. Indeed, it might make the Lao CCVT syllable-based ordering (i.e. initial, final, vowel and then tone) much easier to implement, as well as greatly simplifying the Lao CVCT order I was struggling to implement in March 2017. > It appears that > Vietnamese syllables are well structured, which could allow a > relatively simple transform to do the job, along the lines of what > you are suggesting. In vi.xml, I find "St. Barth?lemy", which is a little worrying. Even more important are words like "a-x?t" or "ax?t" 'acid? and several other partially assimilated words, such as "ph?tpho" 'phosphorus'. > In any event, we'd want to involve Vietnamese experts before going any > further. Glad to have someone else do the work! Richard. From cldr-users at unicode.org Thu Jan 11 20:01:14 2018 From: cldr-users at unicode.org (Richard Wordingham via CLDR-Users) Date: Fri, 12 Jan 2018 02:01:14 +0000 Subject: Vietnamese Collation Wrong for Polysyllables In-Reply-To: References: <20180111001530.5b1cade1@JRWUBU2> Message-ID: <20180112020114.57c3dee3@JRWUBU2> On Thu, 11 Jan 2018 10:03:31 +0100 Mark Davis ?? via CLDR-Users wrote: > The first question I'd have is whether the lexical ordering described > in https://en.wikipedia.org/wiki/Vietnamese_alphabet#Tone_marks is > expected by average Vietnamese. We have seen before cases where a > formal government specification (French accent ordering) is expected > by nobody outside of a small group of mavens. Looking through a selection of dictionaries, a notable feature is that although the principle that the tone mark is an out-of-order primary is maintained, the relative ordering of the marks varies from dictionary to dictionary! This may make it more difficult to assess the alphabetical ordering rules actually expected by the Vietnamese. What syllable-by-syllable ordering means is that, inter alia, they expect words or phrases with the same first syllable to be grouped together. Richard. From cldr-users at unicode.org Thu Jan 11 20:51:03 2018 From: cldr-users at unicode.org (Martin Hosken via CLDR-Users) Date: Fri, 12 Jan 2018 09:51:03 +0700 Subject: adding transforms to collation In-Reply-To: References: <20180111001530.5b1cade1@JRWUBU2> Message-ID: <20180112095103.5c3c12dc@sil-mh8> Dear Mark, > This may just be a case where the UCA doesn't work well enough without > preprocessing. The standard does allow for such preprocessing, and the > question is how to allow for that in CLDR data. One way I can think of by > allowing a transform for text that is applied before sorting to be > specified in the UCA description. For speed, implementations would probably > do that in code, but we could have a data representation that could be used > in a reference implementation. Which is quicker? To run a transform to reorder stuff or to process a thousand contractions? The reason I ask is that something like this could probably be handled by contractions (I haven't checked), but it would take a lot. Burmese for it's CCVT model (ala Lao) takes around 600. Yours, Martin From cldr-users at unicode.org Fri Jan 12 02:46:40 2018 From: cldr-users at unicode.org (Richard Wordingham via CLDR-Users) Date: Fri, 12 Jan 2018 08:46:40 +0000 Subject: adding transforms to collation In-Reply-To: <20180112095103.5c3c12dc@sil-mh8> References: <20180111001530.5b1cade1@JRWUBU2> <20180112095103.5c3c12dc@sil-mh8> Message-ID: <20180112084640.08681a48@JRWUBU2> On Fri, 12 Jan 2018 09:51:03 +0700 Martin Hosken via CLDR-Users wrote: > Dear Mark, > > > This may just be a case where the UCA doesn't work well enough > > without preprocessing. The standard does allow for such > > preprocessing, and the question is how to allow for that in CLDR > > data. One way I can think of by allowing a transform for text that > > is applied before sorting to be specified in the UCA description. > > For speed, implementations would probably do that in code, but we > > could have a data representation that could be used in a reference > > implementation. > > Which is quicker? To run a transform to reorder stuff or to process a > thousand contractions? The reason I ask is that something like this > could probably be handled by contractions (I haven't checked), but it > would take a lot. Burmese for it's CCVT model (ala Lao) takes around > 600. Contractions may be quicker for Burmese - for the most part the task is to reorder VC to CV, and the final consonants are clearly marked as such. I am a bit bothered that I couldn't see a transform to do a rewrite such as VC ? CV where V and C are defined by Unicode sets. It would be helpful to have a full syntax definition in the LDML. Lao is much more complicated, as final consonants are not tagged as such, but are recognised by context rules that are difficult to wind into contractions and prefix rules for the CLDR collation algorithm. For Vietnamese, one complicating factor is that, so far as I am aware, there isn't a full implementation of the CLDR collation algorithm that includes tailoring rules. According to the ICU user guide, no known language has contractions that overlap canonical decompositions - Vietnamese (as described) was not a known language. One can work round this problem. Instead of having contractions for , one would have contractions for . The downside is that this increases the number of contractions by an order of magnitude. Richard. From cldr-users at unicode.org Fri Jan 12 05:07:13 2018 From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users) Date: Fri, 12 Jan 2018 12:07:13 +0100 Subject: adding transforms to collation In-Reply-To: <20180112084640.08681a48@JRWUBU2> References: <20180111001530.5b1cade1@JRWUBU2> <20180112095103.5c3c12dc@sil-mh8> <20180112084640.08681a48@JRWUBU2> Message-ID: Contractions would generally be faster than preprocessing, although the more contractions you have with the same initial substring, the slower it is. But large sets of contractions also slow down the non-contracted forms, because of the extra lookup. So adding "& h < ch" will slow down every instance of collating "c". They also can burden everything because of memory impact. > am a bit bothered that I couldn't see a transform to do a rewrite such as VC ? CV where V and C are defined by Unicode sets. I was just circulating an idea, not a fully-fleshed out approach. However, if we used the syntax ( http://unicode.org/reports/tr35/tr35-general.html#Transforms), that permits: (S1)(S2) ? $2$1; // where S1 and S2 are unicode sets or sequences involving unicode sets Examples in view-source: http://www.unicode.org/repos/cldr/trunk/common/transforms/Latin-Katakana.xml However, another alternative to contractions is to use the http://unicode.org/reports/tr35/tr35-collation.html#Context_Before. Using context is more limited than contractions, but can be much faster and may be applicable for Vietnamese. With that, you can the change the sort order of a latter letter based on previous context. It may not be powerful enough to do what people want to do, but here is a simple example of where it would work. - I want the syllable with ? to sort as a primary difference, *as a whole*: "can" < "c?n" - test case: "can y" < "c?n x", where the x/y difference doesn't matter. - But within the syllable I want the difference between a and ? *not* to swamp later consonants. - test case: "c?n" < "cat", where the n/t difference predominates The following can be entered in http://demo.icu-project.org/icu-bin/collation.html Rules: &t wrote: > On Fri, 12 Jan 2018 09:51:03 +0700 > Martin Hosken via CLDR-Users wrote: > > > Dear Mark, > > > > > This may just be a case where the UCA doesn't work well enough > > > without preprocessing. The standard does allow for such > > > preprocessing, and the question is how to allow for that in CLDR > > > data. One way I can think of by allowing a transform for text that > > > is applied before sorting to be specified in the UCA description. > > > For speed, implementations would probably do that in code, but we > > > could have a data representation that could be used in a reference > > > implementation. > > > > Which is quicker? To run a transform to reorder stuff or to process a > > thousand contractions? The reason I ask is that something like this > > could probably be handled by contractions (I haven't checked), but it > > would take a lot. Burmese for it's CCVT model (ala Lao) takes around > > 600. > > Contractions may be quicker for Burmese - for the most part the task is > to reorder VC to CV, and the final consonants are clearly marked as > such. I am a bit bothered that I couldn't see a transform to do a > rewrite such as VC ? CV where V and C are defined by Unicode sets. It > would be helpful to have a full syntax definition in the LDML. > > Lao is much more complicated, as final consonants are not tagged as > such, but are recognised by context rules that are difficult to wind > into contractions and prefix rules for the CLDR collation algorithm. > > For Vietnamese, one complicating factor is that, so far as I am aware, > there isn't a full implementation of the CLDR collation algorithm that > includes tailoring rules. According to the ICU user guide, no known > language has contractions that overlap canonical decompositions - > Vietnamese (as described) was not a known language. One can work round > this problem. Instead of having contractions for consonant>, one would have contractions for consonant>. The downside is that this increases the number of > contractions by an order of magnitude. > > Richard. > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Fri Jan 12 13:02:33 2018 From: cldr-users at unicode.org (Richard Wordingham via CLDR-Users) Date: Fri, 12 Jan 2018 19:02:33 +0000 Subject: adding transforms to collation In-Reply-To: References: <20180111001530.5b1cade1@JRWUBU2> <20180112095103.5c3c12dc@sil-mh8> <20180112084640.08681a48@JRWUBU2> Message-ID: <20180112190233.24be5e0b@JRWUBU2> On Fri, 12 Jan 2018 12:07:13 +0100 Mark Davis ?? via CLDR-Users wrote: > Contractions would generally be faster than preprocessing, although > the more contractions you have with the same initial substring, the > slower it is. But large sets of contractions also slow down the > non-contracted forms, because of the extra lookup. So adding "& h < > ch" will slow down every instance of collating "c". They also can > burden everything because of memory impact. Unless I am missing some nasty unassimilated borrowings like *s?to, ? think one only needs 65 contractions for reasonably spelt Vietnamese in NFD - 5 tone marks ? (8 final consonants + 4 glide writings () + 'a' for the 3 purely vocalic diphthongs , and ). I believe a tone mark will need a contraction more often than not. Problems are threatened for when the text has been stored in NFC. > > am a bit bothered that I couldn't see a transform to do a rewrite > > such > > as VC ? CV where V and C are defined by Unicode sets. > I was just circulating an idea, not a fully-fleshed out approach. > However, if we used the syntax ( > http://unicode.org/reports/tr35/tr35-general.html#Transforms), that > permits: > > (S1)(S2) ? $2$1; // where S1 and S2 are unicode sets or sequences > involving unicode sets Reassuring to know, but are you sure this isn't just the ICU implementation of LDML? :-) I couldn't find the meaning of '$1' in a transform anywhere in Section 10.3 of the LDML. > However, another alternative to contractions is to use the > http://unicode.org/reports/tr35/tr35-collation.html#Context_Before. > Using context is more limited than contractions, but can be much > faster and may be applicable for Vietnamese. With that, you can the > change the sort order of a latter letter based on previous context. > It may not be powerful enough to do what people want to do, but here > is a simple example of where it would work. > > - I want the syllable with ? to sort as a primary difference, *as a > whole*: "can" < "c?n" > - test case: "can y" < "c?n x", where the x/y difference doesn't > matter. > - But within the syllable I want the difference between a and ? > *not* to swamp later consonants. > - test case: "c?n" < "cat", where the n/t difference > predominates > > The following can be entered in > http://demo.icu-project.org/icu-bin/collation.html > > Rules: > &t &n > // the syntax says: if a 't' comes after an an '?', then sort it as a > primary difference from a regular t. I use this approach in my massive Lao collation table. However, how do you ensure that "caX" < "c?Y" when X and Y starts with punctuation? I relied on a feature of Lao orthography, and am not happy with doing so. Richard. From cldr-users at unicode.org Sun Jan 14 07:21:32 2018 From: cldr-users at unicode.org (Richard Wordingham via CLDR-Users) Date: Sun, 14 Jan 2018 13:21:32 +0000 Subject: Vietnamese Collation Wrong for Polysyllables In-Reply-To: References: <20180111001530.5b1cade1@JRWUBU2> Message-ID: <20180114132132.799e616f@JRWUBU2> On Wed, 10 Jan 2018 19:15:44 -0800 Markus Scherer wrote: > Feel free to submit a ticket. Tickets 10867 (side issue on order of tone marks, which may become more of an issue) and 10868 (primary status of tone mark differences) raised. > AFAIK syllables are separated by spaces. Vietnamese wikipedia pages on scientific topics will show many counter-examples - the most striking being 'Wikipedia' itself! My favourite example is _ax?t_ ?acid?. Richard. From cldr-users at unicode.org Mon Jan 22 07:19:09 2018 From: cldr-users at unicode.org (Elsebeth Flarup via CLDR-Users) Date: Mon, 22 Jan 2018 08:19:09 -0500 Subject: Comparison table with pre-CLDR locales In-Reply-To: References: <151024753325.808.7567282285130647465@votice> Message-ID: Thanks! I actually managed to find the snapshots in the waybackmachine - here's an example with links to the comparison charts for all the Beta 1.0 locales: https://web.archive.org/web/20031227130547/http://oss.software.ibm.com:80/cvs/icu/~checkout~/locale/all_diff_xml/index.html And definitely some interesting differences between the original, surveyed platforms especially in the date format area. Thanks, Elsebeth -------- Original Message -------- On November 10, 2017 5:05 PM, Steven R. Loomis wrote: > It's kind of a 'retrocomputing' project to get this to work again, but here is a snippet of Arabic diffs from about cldr 1.6. Probably false positives and negatives here due to breakage. Some interesting things around date formats. https://www.dropbox.com/s/q1i51n0bebvdbn0/cldr-16-diff-ar.zip?dl=0 > > On Fri, Nov 10, 2017 at 2:47 AM, Elsebeth Flarup via CLDR-Users wrote: > >> Thanks! >> I am not looking for any specific locale category or vendor for that matter. I just wanted to include a few of the most glaring differences between platforms that existed before CLDR as an illustration of the fragmentation at that time. >> >> I am fairly certain that the CLDR project started by collecting a snapshot of the data from as many platforms as possible, and then went through a process of converging on the most common formats for each locale at the time. I believe the comparison table I remember was either used during that process, or was at least a side product of it. I think it was up for several years (I remember referring to it a number of times). Unfortunately I don't even remember the URL used at the beginning of the CLDR project, otherwise the Wayback Machine might be able to help. >> >> Thanks, >> Elsebeth >> >>> -------- Original Message -------- >>> Subject: Re: Comparison table with pre-CLDR locales >>> Local Time: November 9, 2017 6:12 PM >>> UTC Time: November 9, 2017 5:12 PM >>> From: jan.lana at oracle.com >>> To: cldr-users at unicode.org , Elsebeth Flarup >>> >>> Quoting Elsebeth Flarup via CLDR-Users (2017-11-08 11:27:43) >>> >>>> I am preparing a CLDR training session, and as part of that I would like to >>>> include a few specific examples of the differences that existed between locale >>>> formats on various vendor-specific platforms prior to the widespread adoption >>>> of CLDR. >>>> I am fairly sure I remember an online table listing the locale formats from >>>> AIX, Sun Solaris, etc. that were collected at the start of the CLDR project, >>>> but I have been unable to find that table now. Does anybody remember the table, >>>> and where it was located? Alternatively, does anybody know of any other source >>>> that would have that kind of historical data? >>>> >>>> Solaris migrated most of locales to CLDR in Solaris 10 Update Release 4 >>>> in 2007 >>>> (https://docs.oracle.com/cd/E19957-01/820-2714/6nea26qkb/index.html#gevhv) >>>> But there are no public comparison reports from the time as far I know. >>>> >>>> Is there any specific information you try to find? >>>> >>>> regards, >>> >>> - Jan Lana >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Wed Jan 24 23:13:15 2018 From: cldr-users at unicode.org (Zach Laine via CLDR-Users) Date: Wed, 24 Jan 2018 23:13:15 -0600 Subject: Question about ICU decoll.cpp Message-ID: Does the collation used in decoll.cpp end up begin the default collation, the phonebook collation, or something else? After quite a bit of spelunking through the ICU code, I can't figure this out. Thanks, Zach -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Wed Jan 24 23:26:51 2018 From: cldr-users at unicode.org (Steven R Loomis via CLDR-Users) Date: Thu, 25 Jan 2018 05:26:51 +0000 Subject: Question about ICU decoll.cpp In-Reply-To: Message-ID: L P Enviado desde mi iPhone utilizando IBM Verse El ene. 24, 2018 a las 9:17:17 PM, cldr-users at unicode.org escribi?: From: cldr-users at unicode.org To: cldr-users at unicode.org Cc: Date: ene. 24, 2018, 9:17:17 PM Subject: Question about ICU decoll.cpp Does the collation used in decoll.cpp end up begin the default collation, the phonebook collation, or something else? After quite a bit of spelunking through the ICU code, I can't figure this out. Thanks, Zach -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Wed Jan 24 23:27:17 2018 From: cldr-users at unicode.org (Steven R Loomis via CLDR-Users) Date: Thu, 25 Jan 2018 05:27:17 +0000 Subject: Question about ICU decoll.cpp In-Reply-To: Message-ID: CollationGermanTest is a test class not part of icu code Enviado desde mi iPhone utilizando IBM Verse El ene. 24, 2018 a las 9:17:17 PM, cldr-users at unicode.org escribi?: From: cldr-users at unicode.org To: cldr-users at unicode.org Cc: Date: ene. 24, 2018, 9:17:17 PM Subject: Question about ICU decoll.cpp Does the collation used in decoll.cpp end up begin the default collation, the phonebook collation, or something else? After quite a bit of spelunking through the ICU code, I can't figure this out. Thanks, Zach -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Thu Jan 25 13:59:54 2018 From: cldr-users at unicode.org (Steven R Loomis via CLDR-Users) Date: Thu, 25 Jan 2018 19:59:54 +0000 Subject: Question about ICU decoll.cpp In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Sun Jan 28 05:33:08 2018 From: cldr-users at unicode.org (Richard Wordingham via CLDR-Users) Date: Sun, 28 Jan 2018 11:33:08 +0000 Subject: Comparison table with pre-CLDR locales In-Reply-To: References: <151024753325.808.7567282285130647465@votice>

Message-ID: <20180128113308.76c08c48@JRWUBU2> On Mon, 22 Jan 2018 08:19:09 -0500 Elsebeth Flarup via CLDR-Users wrote: > And definitely some interesting differences between the original, > surveyed platforms especially in the date format area. The fading effects of the year 2000 AD may be awkward to track. In spell out, the year '2001' was 'two thousand and one' in 2001, but I believe is now moving back to the more regular 'twenty-oh-one'. Likewise, dd/mm/yyyy dates (e.g. 28/1/2018) are now moving back to dd/mm/yy dates (e.g. 28/1/18) when written manually. And I suspect the French have now completely dropped the idea that the year date 2018 may legitimately be written 19118. Richard.