From martin_hosken at sil.org Thu Mar 6 01:31:38 2014 From: martin_hosken at sil.org (Martin Hosken) Date: Thu, 6 Mar 2014 14:31:38 +0700 Subject: Case Mappings Message-ID: <20140306143138.63340d7b@sil-mh6> Dear All, How would I derive a case mapping from LDML. For example, how would I use tr.xml to derive that lc(I)<>dotless i and uc(i)<>dotted cap I? I realise there is something deep a mysterious going on in 2.5 level collation that is described rather opaquely. Is this where the information is? Or do I need to look somewhere else? TIA, Yours, Martin From mark at macchiato.com Thu Mar 6 05:12:32 2014 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJU=?=) Date: Thu, 6 Mar 2014 12:12:32 +0100 Subject: Case Mappings In-Reply-To: <20140306143138.63340d7b@sil-mh6> References: <20140306143138.63340d7b@sil-mh6> Message-ID: I'm curious as to why you need this, since normally people use the Unicode properties, optionally plus the locale-specific CLDR casing transforms. Mark *? Il meglio ? l?inimico del bene ?* On Thu, Mar 6, 2014 at 8:31 AM, Martin Hosken wrote: > Dear All, > > How would I derive a case mapping from LDML. For example, how would I use > tr.xml to derive that lc(I)<>dotless i and uc(i)<>dotted cap I? I realise > there is something deep a mysterious going on in 2.5 level collation that > is described rather opaquely. Is this where the information is? Or do I > need to look somewhere else? > > TIA, > Yours, > Martin > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Thu Mar 6 12:58:53 2014 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Thu, 6 Mar 2014 18:58:53 +0000 Subject: Case Mappings In-Reply-To: <20140306143138.63340d7b@sil-mh6> References: <20140306143138.63340d7b@sil-mh6> Message-ID: <20140306185853.133a961e@JRWUBU2> On Thu, 6 Mar 2014 14:31:38 +0700 Martin Hosken wrote: > How would I derive a case mapping from LDML. For example, how would I > use tr.xml to derive that lc(I)<>dotless i and uc(i)<>dotted cap I? I > realise there is something deep a mysterious going on in 2.5 level > collation that is described rather opaquely. Is this where the > information is? Or do I need to look somewhere else? I don't believe one is intended to derive this from collation. The full Lithuanian rules are not derivable from the Lithuanian collation rules. The simple answer appears to be that the transforms can be found, at least for CLDR Version 24, in the files: common/transforms/tr-Lower.xml common/transforms/tr-Title.xml common/transforms/tr-Upper.xml This is based on looking for the data. I can't work out how to derive the file names from the LDML Version 24 Part 2 Section 10 or http://cldr.unicode.org/#TOC-How-to-Use-, which are the locations where I would look. http://cldr.unicode.org/index/downloads does give the hint that the data might be in common/transforms. 'Lower', 'Title' and 'Upper' appear to be undocumented targets. Richard. From mark at macchiato.com Wed Mar 12 14:03:06 2014 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJU=?=) Date: Wed, 12 Mar 2014 20:03:06 +0100 Subject: Beta CLDR Spec for v25 (LDML) Message-ID: There is a beta version of the CLDR specification for version 25, with the changes listed at: http://www.unicode.org/reports/tr35/proposed.html#Modifications If you have any feedback on the new sections, please submit it at http://unicode.org/cldr/trac/newticket. If you do, please include a link to the specific section you're commenting on. This is easy to do, since clicking on any header puts a link to that header into your browser's address bar. Mark *? Il meglio ? l?inimico del bene ?* -------------- next part -------------- An HTML attachment was scrubbed... URL: From cdutro at twitter.com Thu Mar 27 13:57:17 2014 From: cdutro at twitter.com (Cameron Dutro) Date: Thu, 27 Mar 2014 11:57:17 -0700 Subject: Territory Codes Message-ID: Hey CLDR users, Does anyone know what standard CLDR's territory codes adhere to? -Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: From srl at icu-project.org Thu Mar 27 14:20:08 2014 From: srl at icu-project.org (Steven R. Loomis) Date: Thu, 27 Mar 2014 14:20:08 -0500 Subject: Territory Codes In-Reply-To: References: Message-ID: <60B0DE42-37FF-4CA1-8063-511F1BA75366@icu-project.org> ISO 3166 territories plus Un m.39 regions (just as bcp47). Tr35 should have references. Enviado desde nuestro iPhone. > El mar 27, 2014, a las 1:57 PM, Cameron Dutro escribi?: > > Hey CLDR users, > > Does anyone know what standard CLDR's territory codes adhere to? > > -Cameron > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From petercon at microsoft.com Fri Mar 28 12:02:05 2014 From: petercon at microsoft.com (Peter Constable) Date: Fri, 28 Mar 2014 17:02:05 +0000 Subject: Adding RUBLE SIGN to keyboard layouts Message-ID: <6cb5e73482f341178b548f8618ccde38@BL2PR03MB450.namprd03.prod.outlook.com> CLDR folk: Has anyone begun to consider how to support the ruble sign in keyboard layouts (for hardware keyboards)? Thanks Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at macchiato.com Fri Mar 28 12:06:32 2014 From: mark at macchiato.com (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?=) Date: Fri, 28 Mar 2014 18:06:32 +0100 Subject: Adding RUBLE SIGN to keyboard layouts In-Reply-To: <6cb5e73482f341178b548f8618ccde38@BL2PR03MB450.namprd03.prod.outlook.com> References: <6cb5e73482f341178b548f8618ccde38@BL2PR03MB450.namprd03.prod.outlook.com> Message-ID: Good question; don't know if the Russians have a standard for where it goes. For comparison, here are the ru keyboards we currently have in CLDR (reflecting data publicly available on the platforms): http://www.unicode.org/cldr/charts/25/keyboards/layouts/ru.html Mark *? Il meglio ? l?inimico del bene ?* On 28 March 2014 18:02, Peter Constable wrote: > CLDR folk: > > > > Has anyone begun to consider how to support the ruble sign in keyboard > layouts (for hardware keyboards)? > > > > > > > > Thanks > > Peter > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From petercon at microsoft.com Fri Mar 28 12:13:42 2014 From: petercon at microsoft.com (Peter Constable) Date: Fri, 28 Mar 2014 17:13:42 +0000 Subject: Adding RUBLE SIGN to keyboard layouts In-Reply-To: References: <6cb5e73482f341178b548f8618ccde38@BL2PR03MB450.namprd03.prod.outlook.com> Message-ID: <72fff88eb93a426cb160ce57c0b4dec5@BL2PR03MB450.namprd03.prod.outlook.com> My understanding is that there is some discussion started within Russia on standards, but that there may be opportunity for influencing this. Vlad, (cc?d) can clarify. Within Microsoft, we?ve been having some discussion around several possibilities and are considering AltGr+8. Peter From: mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] On Behalf Of Mark Davis ?? Sent: March 28, 2014 10:07 AM To: Peter Constable Cc: cldr-users; Agustin Da Fieno Delucchi; Vladislav Shershulsky; Michael Kaplan Subject: Re: Adding RUBLE SIGN to keyboard layouts Good question; don't know if the Russians have a standard for where it goes. For comparison, here are the ru keyboards we currently have in CLDR (reflecting data publicly available on the platforms): http://www.unicode.org/cldr/charts/25/keyboards/layouts/ru.html Mark ? Il meglio ? l?inimico del bene ? On 28 March 2014 18:02, Peter Constable > wrote: CLDR folk: Has anyone begun to consider how to support the ruble sign in keyboard layouts (for hardware keyboards)? Thanks Peter _______________________________________________ CLDR-Users mailing list CLDR-Users at unicode.org http://unicode.org/mailman/listinfo/cldr-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From petercon at microsoft.com Fri Mar 28 12:42:42 2014 From: petercon at microsoft.com (Peter Constable) Date: Fri, 28 Mar 2014 17:42:42 +0000 Subject: Adding RUBLE SIGN to keyboard layouts In-Reply-To: References: <6cb5e73482f341178b548f8618ccde38@BL2PR03MB450.namprd03.prod.outlook.com> , <72fff88eb93a426cb160ce57c0b4dec5@BL2PR03MB450.namprd03.prod.outlook.com> Message-ID: <8c9708cae9d840f9b1690483bffc2d31@BL2PR03MB450.namprd03.prod.outlook.com> Reposting (Vlad is not a list member so his mail won?t get posted). From: Vladislav Shershulsky Sent: March 28, 2014 10:23 AM To: Peter Constable; Mark Davis ?? Cc: cldr-users; Agustin Da Fieno Delucchi; Michael Kaplan; Jan Nelson Subject: ??: Adding RUBLE SIGN to keyboard layouts Peter, I completely agree with your vision of the situation. If AltGr+8 looks reasonable for all we could have more chances to convince Russian experts in this choice. Vlad ?????????? ? ????? Windows Phone ________________________________ ??: Peter Constable ??????????: ?28.?03.?2014 21:13 ????: Mark Davis ?? ?????: cldr-users; Agustin Da Fieno Delucchi; Vladislav Shershulsky; Michael Kaplan; Jan Nelson ????: RE: Adding RUBLE SIGN to keyboard layouts My understanding is that there is some discussion started within Russia on standards, but that there may be opportunity for influencing this. Vlad, (cc?d) can clarify. Within Microsoft, we?ve been having some discussion around several possibilities and are considering AltGr+8. Peter From: mark.edward.davis at gmail.com [mailto:mark.edward.davis at gmail.com] On Behalf Of Mark Davis ?? Sent: March 28, 2014 10:07 AM To: Peter Constable Cc: cldr-users; Agustin Da Fieno Delucchi; Vladislav Shershulsky; Michael Kaplan Subject: Re: Adding RUBLE SIGN to keyboard layouts Good question; don't know if the Russians have a standard for where it goes. For comparison, here are the ru keyboards we currently have in CLDR (reflecting data publicly available on the platforms): http://www.unicode.org/cldr/charts/25/keyboards/layouts/ru.html Mark ? Il meglio ? l?inimico del bene ? On 28 March 2014 18:02, Peter Constable > wrote: CLDR folk: Has anyone begun to consider how to support the ruble sign in keyboard layouts (for hardware keyboards)? Thanks Peter _______________________________________________ CLDR-Users mailing list CLDR-Users at unicode.org http://unicode.org/mailman/listinfo/cldr-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From richard.wordingham at ntlworld.com Sun Mar 30 07:24:45 2014 From: richard.wordingham at ntlworld.com (Richard Wordingham) Date: Sun, 30 Mar 2014 13:24:45 +0100 Subject: Non-primary Weights of U+FFFE Message-ID: <20140330132445.43398a4e@JRWUBU2> Is there any reason that a CLDR-compliant collation algorithm should particularly care about the non-primary weights of U+FFFE? So long as they satisfy the well-formedness conditions, all I can see is that having unique values *may* simplify sort key formation for reversed levels. Richard. From markus.icu at gmail.com Sun Mar 30 11:17:44 2014 From: markus.icu at gmail.com (Markus Scherer) Date: Sun, 30 Mar 2014 09:17:44 -0700 Subject: Non-primary Weights of U+FFFE In-Reply-To: <20140330132445.43398a4e@JRWUBU2> References: <20140330132445.43398a4e@JRWUBU2> Message-ID: On Sun, Mar 30, 2014 at 5:24 AM, Richard Wordingham < richard.wordingham at ntlworld.com> wrote: > Is there any reason that a CLDR-compliant collation algorithm should > particularly care about the non-primary weights of U+FFFE? So long as > they satisfy the well-formedness conditions, all I can see is that > having unique values *may* simplify sort key formation for reversed > levels. > The non-primary weights need to be greater than the level separator(s) and less than the weights of CEs that are ignorable on previous levels. It is also important to generate the special weights on primary to tertiary levels for shifted CEs, so that alternate=shifted works properly. In ICU, we have test code that expects the same sort keys generated from concatenating two strings with U+FFFE vs. calling ucol_mergeSortkeys() on the two separate sort keys. The latter merges sort keys by copying each level (separated by byte 01) from each sort key and inserting a byte 02 between the bytes from different sort keys. (see ucol.h ) markus -- Google Internationalization Engineering -------------- next part -------------- An HTML attachment was scrubbed... URL: From markus.icu at gmail.com Sun Mar 30 11:58:34 2014 From: markus.icu at gmail.com (Markus Scherer) Date: Sun, 30 Mar 2014 09:58:34 -0700 Subject: Non-primary Weights of U+FFFE In-Reply-To: References: <20140330132445.43398a4e@JRWUBU2> Message-ID: PS: What I am realizing here is that we should be able to use byte 02 as a lead byte in any non-primary weight. Primary CEs compare greater than U+FFFE on primary level, and ignorable CEs have high weights and compare greater than many low weights. markus -------------- next part -------------- An HTML attachment was scrubbed... URL: From markus.icu at gmail.com Sun Mar 30 17:08:06 2014 From: markus.icu at gmail.com (Markus Scherer) Date: Sun, 30 Mar 2014 15:08:06 -0700 Subject: Non-primary Weights of U+FFFE In-Reply-To: References: <20140330132445.43398a4e@JRWUBU2> Message-ID: By the way, the ICU locale explorer and its collation demo are updated to the not-yet-released ICU 53 which includes the new collation code. http://demo.icu-project.org/icu-bin/locexp?_=root&d_=en&x=col markus -------------- next part -------------- An HTML attachment was scrubbed... URL: