From cldr-users at unicode.org Fri Mar 2 09:26:18 2018 From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users) Date: Fri, 2 Mar 2018 16:26:18 +0100 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: <1399756717.44805.1520000556843@ox.hosteurope.de> References: <20180301131007.665a7a7059d7ee80bb4d670165c8327d.6d7c6e7a10.wbe@email03.godaddy.com> <1399756717.44805.1520000556843@ox.hosteurope.de> Message-ID: No, the patterns should always have the right format. However, in the supplemental data there is information as to the preferred data for each language. This data isn't collected through the ST, so a ticket needs to be filed. In your particular case, the data has: If DE just doesn't use hB, then you can file a ticket to say that it shouldn't be in @allowed. Note that the format permits either regions or locales, as in: As to involvement, we try to encourage interaction on the forum. In some languages those are quite active; in others not so much. (BTW, a number of your suggestions made sense to me, but not being a native German speaker, I don't weigh in on de.xml except for structural issues or where people seem to miss the intent.) So people may look at the forum, disagree with the proposal, but not respond why they disagree. Mark On Fri, Mar 2, 2018 at 3:22 PM, Christoph P?per via Unicode < unicode at unicode.org> wrote: > F'up2: cldr-users at unicode.org > > Doug Ewell via unicode at unicode.org: > > > > I think that is a measurement of locale coverage -- whether the > > collation tables and translations of "a.m." and "p.m." and "a week ago > > Thursday" are correct and verified -- not character coverage. > > By the way, the binary `am` vs. `pm` distinction common in English and > labelled `a` as a placeholder in CLDR formats is too simplistic for some > languages when using the 12-hour clock (which they usually don't in written > language). In German, for instance, you would always use a format with `B` > instead (i.e. "morgens", "mittags", "abends", "nachts" or no identifier > during daylight). > > How and where can I best suggest to change this in CLDR? The B formats > have their own code, e.g. `Bhms` = `h:mm:ss B`. Should I just propose to > set `hms` etc. to the same value next time the Survey Tool is open? > > In my experience, there are too few people reviewing even the "largest" > languages (like German). I participated in v32 and v33, but other than me > there were only contributions from (seemingly) a single employee from each > of Apple, Google and Microsoft. Most improvements or corrections I > suggested just got lost, i.e. nobody discussed or voted on them, so the old > values remained. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Fri Mar 2 09:51:04 2018 From: cldr-users at unicode.org (Philippe Verdy via CLDR-Users) Date: Fri, 2 Mar 2018 16:51:04 +0100 Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: <1399756717.44805.1520000556843@ox.hosteurope.de> References: <20180301131007.665a7a7059d7ee80bb4d670165c8327d.6d7c6e7a10.wbe@email03.godaddy.com> <1399756717.44805.1520000556843@ox.hosteurope.de> Message-ID: day periods (from 00:0 to 24:00 : sometimes "night", but generally included in "matin", then "midi", "apr?s-midi", "soir") are also used in French muct more usefully than the ambiguous and unused am/pm Latin abbreviations that fell compeltely out of use a few centuries ago (side note: not sure if it was commonly abbreviated, most probably only in written form but not spelled orally where it would read only the full latin words in before French finally replaced the judiciary and liturgic "Late Vulgar Latin" language that no one was really understanding correctlmy and it was constantly creolized with the many regional vernacular oil languages instead of following the liturgic and judiciary style; at that time, the "ante/poste meridiem was only heard in christian masses or judiciary documents, both full of corportative jargons, and even different from the approximative Latin of the adminsitration; then Latin collapsed under regional oil languages that differentiated much between each other, before French was finally created, abandoning Latin as the sole source, but reinventing words borrowed from Greek and adapted to the Anjou oil variant used by ruling nobility and the neighborhood of the King and some passionate chuch personalities that also wanted to incoporate the several oc languages and other european languages for the diplomacy; then Frenchc took about 2 centuries to develop before it finally burnt most regional oil variants and nearly burnt also oc variants ; there remains some Latin expressions in French, but only for specific/technical usages, especially in the judiciary language, like in English; but English kept the "ante/post meridiem" only by its abbreviations, and today, most native English speakers don't know really what "am" and "pm" really means). So yes, day periods should have their own format codes. But the number of day periods varies across languages (not really between distinct scripts of the same language), but more importantly also across gerographic regions/countries/territories (more than by language). CLDR would then need more regional variants than those supported for now (ISO 3166-1 codes may not be sufficient as BCP 47 language subtags ) 2018-03-02 15:22 GMT+01:00 Christoph P?per via Unicode : > F'up2: cldr-users at unicode.org > > Doug Ewell via unicode at unicode.org: > > > > I think that is a measurement of locale coverage -- whether the > > collation tables and translations of "a.m." and "p.m." and "a week ago > > Thursday" are correct and verified -- not character coverage. > > By the way, the binary `am` vs. `pm` distinction common in English and > labelled `a` as a placeholder in CLDR formats is too simplistic for some > languages when using the 12-hour clock (which they usually don't in written > language). In German, for instance, you would always use a format with `B` > instead (i.e. "morgens", "mittags", "abends", "nachts" or no identifier > during daylight). > > How and where can I best suggest to change this in CLDR? The B formats > have their own code, e.g. `Bhms` = `h:mm:ss B`. Should I just propose to > set `hms` etc. to the same value next time the Survey Tool is open? > > In my experience, there are too few people reviewing even the "largest" > languages (like German). I participated in v32 and v33, but other than me > there were only contributions from (seemingly) a single employee from each > of Apple, Google and Microsoft. Most improvements or corrections I > suggested just got lost, i.e. nobody discussed or voted on them, so the old > values remained. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Fri Mar 2 11:13:16 2018 From: cldr-users at unicode.org (=?UTF-8?Q?Christoph_P=C3=A4per?= via CLDR-Users) Date: Fri, 2 Mar 2018 18:13:16 +0100 (CET) Subject: Unicode Emoji 11.0 characters now ready for adoption! In-Reply-To: References: <20180301131007.665a7a7059d7ee80bb4d670165c8327d.6d7c6e7a10.wbe@email03.godaddy.com> <1399756717.44805.1520000556843@ox.hosteurope.de> Message-ID: <334669507.46352.1520010797090@ox.hosteurope.de> Mark Davis: > > In your particular case, the data has: > > preferred="H" > allowed="H hB" > regions="AD AM AO AT AW BE BF BJ BL BR CG CI CV DE EE FR GA GF GN GP GW HR > IL IT KZ MC MD MF MQ MZ NC NL PM PT RE RO SI SM SR ST TG TR WF YT"/> > > If DE just doesn't use hB, then you can file a ticket to say that it > shouldn't be in @allowed. This entry is actually correct, because DE never uses `h` (with `a` instead of `B`). So it's apparently just implementations that are too simplistic and force American `h` on us. From cldr-users at unicode.org Sat Mar 10 03:51:41 2018 From: cldr-users at unicode.org (Francis Tyers via CLDR-Users) Date: Sat, 10 Mar 2018 10:51:41 +0100 Subject: Requirements for getting a new language in Survey Tool Message-ID: Hi, First of all, sorry if this email is going to the wrong place. Please let me know if there is a better forum for my question and I apologise in advance for wasting anyone's time. My question(s): 1) I understand from the CLDR page that the Survey Tool prep starts on the 1st of April, am I correct in understanding that in order for a language to appear in the Survey Tool for the next batch of data entry that it should have at least seed data before the 1st of April? 2) If so, is the best way to go about this to file bugs in the CLDR track and supply this[1] amount of seed data? 3) And if not [1] is there an ideal example of seed data in the SVN that I could use as a template? Thank you for your time, Francis M. Tyers 1. http://unicode.org/repos/cldr/trunk/seed/main/cv.xml From cldr-users at unicode.org Sat Mar 10 11:21:57 2018 From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users) Date: Sat, 10 Mar 2018 18:21:57 +0100 Subject: Requirements for getting a new language in Survey Tool In-Reply-To: References: Message-ID: It needs to have at least core data as described on http://cldr.unicode.org/index/cldr-spec/minimaldata, *and* a commitment to supply the minimal data during the submission period. The core data needs to be incorporated before the May/Nov 27 "Start Shakedown Submission". It takes some time to process the new language, so April 1 is a good target Mark On Sat, Mar 10, 2018 at 10:51 AM, Francis Tyers via CLDR-Users < cldr-users at unicode.org> wrote: > Hi, > > First of all, sorry if this email is going to the wrong place. Please let > me know if there is a better forum for my question and I apologise in > advance for wasting anyone's time. > > My question(s): > > 1) I understand from the CLDR page that the Survey Tool prep starts on the > 1st of April, am I correct in understanding that in order for a language to > appear in the Survey Tool for the next batch of data entry that it should > have at least seed data before the 1st of April? > 2) If so, is the best way to go about this to file bugs in the CLDR track > and supply this[1] amount of seed data? > 3) And if not [1] is there an ideal example of seed data in the SVN that I > could use as a template? > > Thank you for your time, > > Francis M. Tyers > > 1. http://unicode.org/repos/cldr/trunk/seed/main/cv.xml > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Sat Mar 10 11:33:53 2018 From: cldr-users at unicode.org (Francis Tyers via CLDR-Users) Date: Sat, 10 Mar 2018 18:33:53 +0100 Subject: Requirements for getting a new language in Survey Tool In-Reply-To: References: Message-ID: El 2018-03-10 18:21, Mark Davis ?? escribi?: > It needs to have at least core data as described on > http://cldr.unicode.org/index/cldr-spec/minimaldata, _and_ a > commitment to supply the minimal data during the submission period. Should this commitment be stated in the bug itself ? For example, as stated in https://unicode.org/cldr/trac/ticket/10985. I am working with a group of people who are intending to improve coverage of the languages of Russia. It would be good to get feedback on the ticket above in case anything is missing or needs to be clarified. > The core data needs to be incorporated before the May/Nov 27 "Start > Shakedown Submission". It takes some time to process the new language, > so April 1 is a good target Ok, great thanks! Fran From cldr-users at unicode.org Wed Mar 14 10:46:02 2018 From: cldr-users at unicode.org (John Emmons via CLDR-Users) Date: Wed, 14 Mar 2018 09:46:02 -0600 Subject: CLDR 33-beta is now available. Message-ID: The beta version of CLDR release 33 is now available for testing. See http://cldr.unicode.org/index/downloads/cldr-33 for details. Regards, John C. Emmons Globalization Architect & Unicode CLDR TC Vice Chairman IBM Globalization Team e-mail: emmo at us.ibm.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Wed Mar 14 14:48:28 2018 From: cldr-users at unicode.org (George S. via CLDR-Users) Date: Wed, 14 Mar 2018 13:48:28 -0600 Subject: en_GB.xml Gregorian Date Formats Message-ID: <983effcc-0339-2ee8-ed98-2dc096a6a485@mhsoftware.com> I'm looking at the file comm/main/en_GB.xml and I'm really confused. I'm looking at the Gregorian calendar section, and there's no dateFormats / dateFormatLength=short the value in en.xml is M/d/yy If I look at en_AU.xml there is an entry with a value of "d/M/yy". Similarly, en_IE.xml there is no short dateFormatLength value. Can anyone help me understand how this all works? I'm using a library that generates it's localization files from LDML, and it's coming up with a lot of wrong answers. Before I go to them, I'd like to understand why things are formatted in this way. -- George S. *MH Software, Inc.* Voice: 303 438 9585 http://www.mhsoftware.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Wed Mar 14 15:38:14 2018 From: cldr-users at unicode.org (Peter Edberg via CLDR-Users) Date: Wed, 14 Mar 2018 13:38:14 -0700 Subject: en_GB.xml Gregorian Date Formats In-Reply-To: <983effcc-0339-2ee8-ed98-2dc096a6a485@mhsoftware.com> References: <983effcc-0339-2ee8-ed98-2dc096a6a485@mhsoftware.com> Message-ID: <0800D633-FAB0-486A-8A00-A63A7E04E752@unicode.org> en_GB inherits from en_001, not from en. - Peter E > On Mar 14, 2018, at 12:48 PM, George S. via CLDR-Users wrote: > > I'm looking at the file comm/main/en_GB.xml and I'm really confused. I'm looking at the Gregorian calendar section, and there's no > dateFormats / dateFormatLength=short > > the value in en.xml is > > M/d/yy > > If I look at en_AU.xml there is an entry with a value of "d/M/yy". > > Similarly, en_IE.xml there is no short dateFormatLength value. > > Can anyone help me understand how this all works? I'm using a library that generates it's localization files from LDML, and it's coming up with a lot of wrong answers. Before I go to them, I'd like to understand why things are formatted in this way. > > -- > George S. > MH Software, Inc. > Voice: 303 438 9585 > http://www.mhsoftware.com _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Wed Mar 14 15:57:51 2018 From: cldr-users at unicode.org (George S. via CLDR-Users) Date: Wed, 14 Mar 2018 14:57:51 -0600 Subject: en_GB.xml Gregorian Date Formats In-Reply-To: <0800D633-FAB0-486A-8A00-A63A7E04E752@unicode.org> References: <983effcc-0339-2ee8-ed98-2dc096a6a485@mhsoftware.com> <0800D633-FAB0-486A-8A00-A63A7E04E752@unicode.org> Message-ID: <4d564541-3ad7-2f8e-ed38-b10828a3e0ff@mhsoftware.com> Thanks for responding. I knew I'd gone down this road before. Drat. I'll make the same comment I made three years ago: It would be nice if the en_GB.xml file referenced it's parent so that mortals might have some idea of where to look. Having the relationship squirreled away in a file in another directory with a non-obvious name isn't very handy. On 3/14/2018 2:38 PM, Peter Edberg wrote: > en_GB inherits from en_001, not from en. > > - Peter E > >> On Mar 14, 2018, at 12:48 PM, George S. via CLDR-Users >> > wrote: >> >> I'm looking at the file comm/main/en_GB.xml and I'm really confused. >> I'm looking at the Gregorian calendar section, and there's no >> >> dateFormats / dateFormatLength=short >> >> the value in en.xml is >> >> M/d/yy >> >> If I look at en_AU.xml there is an entry with a value of "d/M/yy". >> >> Similarly, en_IE.xml there is no short dateFormatLength value. >> >> Can anyone help me understand how this all works? I'm using a library >> that generates it's localization files from LDML, and it's coming up >> with a lot of wrong answers. Before I go to them, I'd like to >> understand why things are formatted in this way. >> >> >> -- >> George S. >> *MH Software, Inc.* >> Voice: 303 438 9585 >> http://www.mhsoftware.com >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users > -- George S. *MH Software, Inc.* Voice: 303 438 9585 http://www.mhsoftware.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Wed Mar 14 16:26:19 2018 From: cldr-users at unicode.org (Steven R. Loomis via CLDR-Users) Date: Wed, 14 Mar 2018 14:26:19 -0700 Subject: en_GB.xml Gregorian Date Formats In-Reply-To: <4d564541-3ad7-2f8e-ed38-b10828a3e0ff@mhsoftware.com> References: <983effcc-0339-2ee8-ed98-2dc096a6a485@mhsoftware.com> <0800D633-FAB0-486A-8A00-A63A7E04E752@unicode.org> <4d564541-3ad7-2f8e-ed38-b10828a3e0ff@mhsoftware.com> Message-ID: George, > It would be nice if the en_GB.xml file referenced it's parent I appreciate the idea, however, the XML files are not designed to be looked at in isolation. That's why we put this notice at the top: " CLDR data files are interpreted according to the LDML specification ( http://unicode.org/reports/tr35/) " Is there a better way to word this? Please also see the Implementer's guide and FAQ at https://github.com/unicode-org/cldr-implementers-guide/ - if you think this would be a good FAQ can you open an issue, or better yet a pull request there? On Wed, Mar 14, 2018 at 1:57 PM, George S. via CLDR-Users < cldr-users at unicode.org> wrote: > Thanks for responding. I knew I'd gone down this road before. Drat. > > I'll make the same comment I made three years ago: > > It would be nice if the en_GB.xml file referenced it's parent so that > mortals might have some idea of where to look. Having the relationship > squirreled away in a file in another directory with a non-obvious name > isn't very handy. > > > > On 3/14/2018 2:38 PM, Peter Edberg wrote: > > en_GB inherits from en_001, not from en. > > - Peter E > > On Mar 14, 2018, at 12:48 PM, George S. via CLDR-Users < > cldr-users at unicode.org> wrote: > > I'm looking at the file comm/main/en_GB.xml and I'm really confused. I'm > looking at the Gregorian calendar section, and there's no > > dateFormats / dateFormatLength=short > > the value in en.xml is > > M/d/yy > > If I look at en_AU.xml there is an entry with a value of "d/M/yy". > > Similarly, en_IE.xml there is no short dateFormatLength value. > > Can anyone help me understand how this all works? I'm using a library that > generates it's localization files from LDML, and it's coming up with a lot > of wrong answers. Before I go to them, I'd like to understand why things > are formatted in this way. > > > -- > George S. > *MH Software, Inc.* > Voice: 303 438 9585 <(303)%20438-9585> > http://www.mhsoftware.com > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > > > -- > George S. > *MH Software, Inc.* > Voice: 303 438 9585 <(303)%20438-9585> > http://www.mhsoftware.com > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Wed Mar 14 16:47:43 2018 From: cldr-users at unicode.org (George S. via CLDR-Users) Date: Wed, 14 Mar 2018 15:47:43 -0600 Subject: en_GB.xml Gregorian Date Formats In-Reply-To: References: <983effcc-0339-2ee8-ed98-2dc096a6a485@mhsoftware.com> <0800D633-FAB0-486A-8A00-A63A7E04E752@unicode.org> <4d564541-3ad7-2f8e-ed38-b10828a3e0ff@mhsoftware.com> Message-ID: <747f7be1-4383-9b49-d8a5-c9b1c55d3349@mhsoftware.com> Personally, I find the reasoning to be circular: "Since parentLocale information is not localizable on a per locale basis, the parentLocale information is contained in CLDR?s supplemental data." There are many things in the locale files that are not strictly localizable. Here's an example: saying you're not going to put the parent locale in because it's not localizable is kind of silly when you have lot's of data in the file that's not localizable. I'm suggesting: But it's your guys' project. On 3/14/2018 3:26 PM, Steven R. Loomis wrote: > George, > > It would be nice if the en_GB.xml file referenced it's parent > > I appreciate the idea, however, the XML files?are not designed to > be?looked at in?isolation. That's why we put this notice at the top: > > " CLDR data files are interpreted according to the LDML specification > (http://unicode.org/reports/tr35/) " > > Is there a better way to word this? > > Please also see the Implementer's guide and FAQ at > https://github.com/unicode-org/cldr-implementers-guide/ ?- if you > think this would be a good FAQ can you open an issue, or better yet a > pull request there? > > > On Wed, Mar 14, 2018 at 1:57 PM, George S. via CLDR-Users > > wrote: > > Thanks for responding. I knew I'd gone down this road before. Drat. > > I'll make the same comment I made three years ago: > > It would be nice if the en_GB.xml file referenced it's parent so > that mortals might have some idea of where to look. Having the > relationship squirreled away in a file in another directory with a > non-obvious name isn't very handy. > > > > On 3/14/2018 2:38 PM, Peter Edberg wrote: >> en_GB inherits from en_001, not from en. >> >> - Peter E >> >>> On Mar 14, 2018, at 12:48 PM, George S. via CLDR-Users >>> > wrote: >>> >>> I'm looking at the file comm/main/en_GB.xml and I'm really >>> confused. I'm looking at the Gregorian calendar section, and >>> there's no >>> >>> dateFormats / dateFormatLength=short >>> >>> the value in en.xml is >>> >>> M/d/yy >>> >>> If I look at en_AU.xml there is an entry with a value of "d/M/yy". >>> >>> Similarly, en_IE.xml there is no short dateFormatLength value. >>> >>> Can anyone help me understand how this all works? I'm using a >>> library that generates it's localization files from LDML, and >>> it's coming up with a lot of wrong answers. Before I go to them, >>> I'd like to understand why things are formatted in this way. >>> >>> >>> -- >>> George S. >>> *MH Software, Inc.* >>> Voice: 303 438 9585 >>> http://www.mhsoftware.com >>> _______________________________________________ >>> CLDR-Users mailing list >>> CLDR-Users at unicode.org >>> http://unicode.org/mailman/listinfo/cldr-users >>> >> > > -- > George S. > *MH Software, Inc.* > Voice: 303 438 9585 > http://www.mhsoftware.com > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > > -- George S. *MH Software, Inc.* Voice: 303 438 9585 http://www.mhsoftware.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Wed Mar 14 18:04:58 2018 From: cldr-users at unicode.org (Steven R. Loomis via CLDR-Users) Date: Wed, 14 Mar 2018 16:04:58 -0700 Subject: en_GB.xml Gregorian Date Formats In-Reply-To: <747f7be1-4383-9b49-d8a5-c9b1c55d3349@mhsoftware.com> References: <983effcc-0339-2ee8-ed98-2dc096a6a485@mhsoftware.com> <0800D633-FAB0-486A-8A00-A63A7E04E752@unicode.org> <4d564541-3ad7-2f8e-ed38-b10828a3e0ff@mhsoftware.com> <747f7be1-4383-9b49-d8a5-c9b1c55d3349@mhsoftware.com> Message-ID: You're quoting from https://www.unicode.org/reports/tr35/tr35.html#Parent_Locales > ? ? > There are many things in the locale files that are not strictly localizable. Here's an example: > "narrow" here is a distinguishing attribute ( see https://www.unicode.org/reports/tr35/tr35.html#Definitions ) and is part of the identity of the element content that follows. I think the point of the quote is that the "parent locale" is structural and not part of the identity of the specific xml file. If you look at the parent locales in supplemental, they are organized from the point of view of the parent, for setting "which locales inherit from en-150?" Parsing the supplementalData is critical to processing CLDR data. The CLDR Java tooling is available in the source repository, it could be a source of comparison for file handling. Steven On Wed, Mar 14, 2018 at 2:47 PM, George S. via CLDR-Users < cldr-users at unicode.org> wrote: > Personally, I find the reasoning to be circular: > > "Since parentLocale information is not localizable on a per locale basis, > the parentLocale information is contained in CLDR?s supplemental data." > > There are many things in the locale files that are not strictly > localizable. Here's an example: > > > > saying you're not going to put the parent locale in because it's not > localizable is kind of silly when you have lot's of data in the file that's > not localizable. > > I'm suggesting: > > > > > > > > But it's your guys' project. > > > On 3/14/2018 3:26 PM, Steven R. Loomis wrote: > > George, > > It would be nice if the en_GB.xml file referenced it's parent > > I appreciate the idea, however, the XML files are not designed to > be looked at in isolation. That's why we put this notice at the top: > > " CLDR data files are interpreted according to the LDML specification ( > http://unicode.org/reports/tr35/) " > > Is there a better way to word this? > > Please also see the Implementer's guide and FAQ at > https://github.com/unicode-org/cldr-implementers-guide/ - if you think > this would be a good FAQ can you open an issue, or better yet a pull > request there? > > > On Wed, Mar 14, 2018 at 1:57 PM, George S. via CLDR-Users < > cldr-users at unicode.org> wrote: > >> Thanks for responding. I knew I'd gone down this road before. Drat. >> >> I'll make the same comment I made three years ago: >> >> It would be nice if the en_GB.xml file referenced it's parent so that >> mortals might have some idea of where to look. Having the relationship >> squirreled away in a file in another directory with a non-obvious name >> isn't very handy. >> >> >> >> On 3/14/2018 2:38 PM, Peter Edberg wrote: >> >> en_GB inherits from en_001, not from en. >> >> - Peter E >> >> On Mar 14, 2018, at 12:48 PM, George S. via CLDR-Users < >> cldr-users at unicode.org> wrote: >> >> I'm looking at the file comm/main/en_GB.xml and I'm really confused. I'm >> looking at the Gregorian calendar section, and there's no >> >> dateFormats / dateFormatLength=short >> >> the value in en.xml is >> >> M/d/yy >> >> If I look at en_AU.xml there is an entry with a value of "d/M/yy". >> >> Similarly, en_IE.xml there is no short dateFormatLength value. >> >> Can anyone help me understand how this all works? I'm using a library >> that generates it's localization files from LDML, and it's coming up with a >> lot of wrong answers. Before I go to them, I'd like to understand why >> things are formatted in this way. >> >> >> -- >> George S. >> *MH Software, Inc.* >> Voice: 303 438 9585 <%28303%29%20438-9585> >> http://www.mhsoftware.com >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> >> >> -- >> George S. >> *MH Software, Inc.* >> Voice: 303 438 9585 <%28303%29%20438-9585> >> http://www.mhsoftware.com >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> > > -- > George S. > *MH Software, Inc.* > Voice: 303 438 9585 <(303)%20438-9585> > http://www.mhsoftware.com > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Wed Mar 14 18:50:14 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Thu, 15 Mar 2018 00:50:14 +0100 (CET) Subject: CLDR 33-beta is now available. In-Reply-To: References: Message-ID: <203536449.25095.1521071414103.JavaMail.www@wwinf1m17> > > The beta version of CLDR release 33 is now available for testing. ?See http://cldr.unicode.org/index/downloads/cldr-33for details. > http://unicode.org/repos/cldr/trunk/specs/ldml/tr35-keyboards.html has still the inconsistencies and layout issues corrected in: http://charupdate.info/unicode/revision/tr35/33-50/tr35-keyboards.html http://charupdate.info/unicode/revision/tr35/33-51/tr35-keyboards.html linked from: https://unicode.org/cldr/trac/ticket/10901 If the editors are lacking the time needed to implement the fixes, I feel committed to lend a helping hand because Unicode simply cannot leave that spec in the actual state without serious image damages. At least I would feel ashamed to cite the paper anywhere (talking about keyboards). Best regards, Marcel From cldr-users at unicode.org Wed Mar 14 18:53:08 2018 From: cldr-users at unicode.org (Philippe Verdy via CLDR-Users) Date: Thu, 15 Mar 2018 00:53:08 +0100 Subject: en_GB.xml Gregorian Date Formats In-Reply-To: References: <983effcc-0339-2ee8-ed98-2dc096a6a485@mhsoftware.com> <0800D633-FAB0-486A-8A00-A63A7E04E752@unicode.org> <4d564541-3ad7-2f8e-ed38-b10828a3e0ff@mhsoftware.com> <747f7be1-4383-9b49-d8a5-c9b1c55d3349@mhsoftware.com> Message-ID: I don't think these fallbacks are structured in the correct direction: this should go from a single child to its single parent and not from some parent to the full list of its children (which actually has no use for localization purpose). We should not need to update the "root" locale for example, which would be the only locale without any parent specified, or even would not need to enumerate all locales in the "parentLocales" supplemental data (which could be deprecated completely and is in fact not needed at all if each locale specifies its own "parent" locale, or an ordered list of candidate fallbacks to search first, before searching recursively each candidated with their respective BCP47 parents, except "root", then finally search for "root"). Note: this is related to some experience that I made in Wikimedia Commons to use BCP47 fallback mechanism more coherently and allow easier tuning of fallbacks: this is always organized first from a child locale specifying its prefered fallbacks. There's interesting discussions about this in Module:Fallback (it is still in a sandbox version, still not deployed completely, but tests are succcessfully handling all cases, including the need to tune fallbacks locally for a project, here Wikimedia Commons, then use more generic fallback mechanisms across diverse wikis via Mediawiki default fallbacks, then enfore the BCP 47 conformance, then using a "root" = "default" locale for specific needs (basically for handling missing translations and track them), then some local safe default (the content language of the local wiki, then basic English which is used as the last chance). In all projects, we use locale fallbacks in the direction from child to parent, never the reverse which is not maintainable. 2018-03-15 0:04 GMT+01:00 Steven R. Loomis via CLDR-Users < cldr-users at unicode.org>: > You're quoting from https://www.unicode.org/reports/tr35/tr35.html#Parent_ > Locales > > > ? ? > > > There are many things in the locale files that are not strictly > localizable. Here's an example: > > > > "narrow" here is a distinguishing attribute ( see > https://www.unicode.org/reports/tr35/tr35.html#Definitions ) and is part > of the identity of the element content that follows. > > I think the point of the quote is that the "parent locale" is structural > and not part of the identity of the specific xml file. If you look at the > parent locales in supplemental, they are organized from the point of view > of the parent, for setting "which locales inherit from en-150?" > > Parsing the supplementalData is critical to processing CLDR data. The > CLDR Java tooling is available in the source repository, it could be a > source of comparison for file handling. > > Steven > > > > > > > > > > > > > On Wed, Mar 14, 2018 at 2:47 PM, George S. via CLDR-Users < > cldr-users at unicode.org> wrote: > >> Personally, I find the reasoning to be circular: >> >> "Since parentLocale information is not localizable on a per locale basis, >> the parentLocale information is contained in CLDR?s supplemental data." >> >> There are many things in the locale files that are not strictly >> localizable. Here's an example: >> >> >> >> saying you're not going to put the parent locale in because it's not >> localizable is kind of silly when you have lot's of data in the file that's >> not localizable. >> >> I'm suggesting: >> >> >> >> >> >> >> >> But it's your guys' project. >> >> >> On 3/14/2018 3:26 PM, Steven R. Loomis wrote: >> >> George, >> > It would be nice if the en_GB.xml file referenced it's parent >> >> I appreciate the idea, however, the XML files are not designed to >> be looked at in isolation. That's why we put this notice at the top: >> >> " CLDR data files are interpreted according to the LDML specification ( >> http://unicode.org/reports/tr35/) " >> >> Is there a better way to word this? >> >> Please also see the Implementer's guide and FAQ at >> https://github.com/unicode-org/cldr-implementers-guide/ - if you think >> this would be a good FAQ can you open an issue, or better yet a pull >> request there? >> >> >> On Wed, Mar 14, 2018 at 1:57 PM, George S. via CLDR-Users < >> cldr-users at unicode.org> wrote: >> >>> Thanks for responding. I knew I'd gone down this road before. Drat. >>> >>> I'll make the same comment I made three years ago: >>> >>> It would be nice if the en_GB.xml file referenced it's parent so that >>> mortals might have some idea of where to look. Having the relationship >>> squirreled away in a file in another directory with a non-obvious name >>> isn't very handy. >>> >>> >>> >>> On 3/14/2018 2:38 PM, Peter Edberg wrote: >>> >>> en_GB inherits from en_001, not from en. >>> >>> - Peter E >>> >>> On Mar 14, 2018, at 12:48 PM, George S. via CLDR-Users < >>> cldr-users at unicode.org> wrote: >>> >>> I'm looking at the file comm/main/en_GB.xml and I'm really confused. I'm >>> looking at the Gregorian calendar section, and there's no >>> >>> dateFormats / dateFormatLength=short >>> >>> the value in en.xml is >>> >>> M/d/yy >>> >>> If I look at en_AU.xml there is an entry with a value of "d/M/yy". >>> >>> Similarly, en_IE.xml there is no short dateFormatLength value. >>> >>> Can anyone help me understand how this all works? I'm using a library >>> that generates it's localization files from LDML, and it's coming up with a >>> lot of wrong answers. Before I go to them, I'd like to understand why >>> things are formatted in this way. >>> >>> >>> -- >>> George S. >>> *MH Software, Inc.* >>> Voice: 303 438 9585 <%28303%29%20438-9585> >>> http://www.mhsoftware.com >>> _______________________________________________ >>> CLDR-Users mailing list >>> CLDR-Users at unicode.org >>> http://unicode.org/mailman/listinfo/cldr-users >>> >>> >>> >>> -- >>> George S. >>> *MH Software, Inc.* >>> Voice: 303 438 9585 <%28303%29%20438-9585> >>> http://www.mhsoftware.com >>> >>> _______________________________________________ >>> CLDR-Users mailing list >>> CLDR-Users at unicode.org >>> http://unicode.org/mailman/listinfo/cldr-users >>> >>> >> >> -- >> George S. >> *MH Software, Inc.* >> Voice: 303 438 9585 <(303)%20438-9585> >> http://www.mhsoftware.com >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Wed Mar 14 21:37:41 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Thu, 15 Mar 2018 03:37:41 +0100 (CET) Subject: CLDR 33-beta is now available. Message-ID: <145154080.41.1521081461942.JavaMail.www@wwinf1m17> Likewise, pulling out a simple example: http://unicode.org/repos/cldr/trunk/specs/ldml/tr35-keyboards.html#Definitions still has: ?Key: A key on a physical keyboard.? while http://charupdate.info/unicode/revision/tr35/33-50/tr35-keyboards.html#Definitions proposes: ?Key: A button on a physical or virtual keyboard.? Other example: L and R should be prefixes like already in part of the spec, not suffixes. See rationale in: https://unicode.org/cldr/trac/ticket/10906 Given the varying use ? suffix, prefix ? concatenations of modifiers should use +, again like already in part of the spec (while in other parts of the same page they are concatenated without plus sign). The proposed edit has all added plus signs highlighted. In?text tables should have cellpadding, like this: The last version has several instances of:
which have all been removed for the actual draft revision! There must be something intentional, then. However, the idea was that leaving all those mistakes in the first place reflects worse on Unicode than having one revision with plenty of those small edits highlighted. Best regards, Marcel > Message du 15/03/18 00:52 > De : "Marcel Schneider via CLDR-Users" > A : "Steven Loomis" , "CLDR-Users" > Copie ? : > Objet : Re: CLDR 33-beta is now available. > > > The beta version of CLDR release 33 is now available for testing. See http://cldr.unicode.org/index/downloads/cldr-33for details. > http://unicode.org/repos/cldr/trunk/specs/ldml/tr35-keyboards.html has still the inconsistencies and layout issues corrected in: http://charupdate.info/unicode/revision/tr35/33-50/tr35-keyboards.html http://charupdate.info/unicode/revision/tr35/33-51/tr35-keyboards.html linked from: https://unicode.org/cldr/trac/ticket/10901 If the editors are lacking the time needed to implement the fixes, I feel committed to lend a helping hand because Unicode simply cannot leave that spec in the actual state without serious image damages. At least I would feel ashamed to cite the paper anywhere (talking about keyboards). Best regards, Marcel From cldr-users at unicode.org Thu Mar 15 15:35:25 2018 From: cldr-users at unicode.org (George S. via CLDR-Users) Date: Thu, 15 Mar 2018 14:35:25 -0600 Subject: en_GB.xml Gregorian Date Formats In-Reply-To: References: <983effcc-0339-2ee8-ed98-2dc096a6a485@mhsoftware.com> <0800D633-FAB0-486A-8A00-A63A7E04E752@unicode.org> <4d564541-3ad7-2f8e-ed38-b10828a3e0ff@mhsoftware.com> <747f7be1-4383-9b49-d8a5-c9b1c55d3349@mhsoftware.com> Message-ID: <8027d2d1-6c96-b683-3cda-8d835e8164d2@mhsoftware.com> On 3/14/2018 5:04 PM, Steven R. Loomis via CLDR-Users wrote: > You're quoting from > https://www.unicode.org/reports/tr35/tr35.html#Parent_Locales > > > ? ? ? ? > > > There are many things in the locale files that are not strictly > localizable. Here's an example: > > > > ?"narrow" here is a distinguishing attribute ?( see > https://www.unicode.org/reports/tr35/tr35.html#Definitions ) and is > part of the identity of the element content that follows. > > I think the point of the quote is that the "parent locale" is > structural and not part of the identity of the specific xml file. I can think of few things more structural than where does this locale's defaults originate from. Without that identity, the child file's definition is incomplete. Placing the relationship data in another file in a different directory entirely requires novices like myself to do a tremendous amount of research to understand what's going on. Even though the maintainers may have had really excellent reasons for this structure, from the developer standpoint it's not sensible. > If you look at the parent locales in supplemental, they are organized > from the point of view of the parent, for setting "which locales > inherit from en-150?" As a developer, that uses LDML files, I can absolutely guarantee you that I will NEVER ask that question. en-150 is a synthetic thing you folks created to organize data. It's just not related to my workflow. en-GB is my workflow. As a maintainer of the data, perhaps that's useful for you. > > Parsing the supplementalData is critical to processing CLDR data. That's true because of design decisions that were made by the maintainers. It's not the only possible solution, and I don't think from the consumer (developer) standpoint, it's even remotely the best. But, these are my opinions and are generally uninformed and only accurate from my point of view. > The CLDR Java tooling is available in the source repository, it could > be a source of comparison for file handling. > > Steven > > > > > > > > > > > > > On Wed, Mar 14, 2018 at 2:47 PM, George S. via CLDR-Users > > wrote: > > Personally, I find the reasoning to be circular: > > "Since parentLocale information is not localizable on a per locale > basis, the parentLocale information is contained in CLDR?s > supplemental data." > > There are many things in the locale files that are not strictly > localizable. Here's an example: > > > > saying you're not going to put the parent locale in because it's > not localizable is kind of silly when you have lot's of data in > the file that's not localizable. > > I'm suggesting: > > > > > > > > > But it's your guys' project. > > > On 3/14/2018 3:26 PM, Steven R. Loomis wrote: >> George, >> > It would be nice if the en_GB.xml file referenced it's parent >> >> I appreciate the idea, however, the XML files?are not designed to >> be?looked at in?isolation. That's why we put this notice at the top: >> >> " CLDR data files are interpreted according to the LDML >> specification (http://unicode.org/reports/tr35/ >> ) " >> >> Is there a better way to word this? >> >> Please also see the Implementer's guide and FAQ at >> https://github.com/unicode-org/cldr-implementers-guide/ >> ?- if >> you think this would be a good FAQ can you open an issue, or >> better yet a pull request there? >> >> >> On Wed, Mar 14, 2018 at 1:57 PM, George S. via CLDR-Users >> > wrote: >> >> Thanks for responding. I knew I'd gone down this road before. >> Drat. >> >> I'll make the same comment I made three years ago: >> >> It would be nice if the en_GB.xml file referenced it's parent >> so that mortals might have some idea of where to look. Having >> the relationship squirreled away in a file in another >> directory with a non-obvious name isn't very handy. >> >> >> >> On 3/14/2018 2:38 PM, Peter Edberg wrote: >>> en_GB inherits from en_001, not from en. >>> >>> - Peter E >>> >>>> On Mar 14, 2018, at 12:48 PM, George S. via CLDR-Users >>>> > wrote: >>>> >>>> I'm looking at the file comm/main/en_GB.xml and I'm really >>>> confused. I'm looking at the Gregorian calendar section, >>>> and there's no >>>> >>>> dateFormats / dateFormatLength=short >>>> >>>> the value in en.xml is >>>> >>>> M/d/yy >>>> >>>> If I look at en_AU.xml there is an entry with a value of >>>> "d/M/yy". >>>> >>>> Similarly, en_IE.xml there is no short dateFormatLength value. >>>> >>>> Can anyone help me understand how this all works? I'm using >>>> a library that generates it's localization files from LDML, >>>> and it's coming up with a lot of wrong answers. Before I go >>>> to them, I'd like to understand why things are formatted in >>>> this way. >>>> >>>> >>>> -- >>>> George S. >>>> *MH Software, Inc.* >>>> Voice: 303 438 9585 >>>> http://www.mhsoftware.com >>>> _______________________________________________ >>>> CLDR-Users mailing list >>>> CLDR-Users at unicode.org >>>> http://unicode.org/mailman/listinfo/cldr-users >>>> >>> >> >> -- >> George S. >> *MH Software, Inc.* >> Voice: 303 438 9585 >> http://www.mhsoftware.com >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> >> > > -- > George S. > *MH Software, Inc.* > Voice: 303 438 9585 > http://www.mhsoftware.com > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > > > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users -- George S. *MH Software, Inc.* Voice: 303 438 9585 http://www.mhsoftware.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Thu Mar 15 16:33:56 2018 From: cldr-users at unicode.org (Philippe Verdy via CLDR-Users) Date: Thu, 15 Mar 2018 22:33:56 +0100 Subject: en_GB.xml Gregorian Date Formats In-Reply-To: <8027d2d1-6c96-b683-3cda-8d835e8164d2@mhsoftware.com> References: <983effcc-0339-2ee8-ed98-2dc096a6a485@mhsoftware.com> <0800D633-FAB0-486A-8A00-A63A7E04E752@unicode.org> <4d564541-3ad7-2f8e-ed38-b10828a3e0ff@mhsoftware.com> <747f7be1-4383-9b49-d8a5-c9b1c55d3349@mhsoftware.com> <8027d2d1-6c96-b683-3cda-8d835e8164d2@mhsoftware.com> Message-ID: The "non-localisable" argument is clearly untrue. "Localisable" does NOT mean "translatable", it means that a locale can have specialized data, and in this case, the parent locale of each locale is a localisation data for that locale, even if iits data not text meant to be read and translated by humans, but a technical code. The same applied to localisable number formats which contain almost only technical data not meant to be read by humans directly, but automatically processed by computers. So I'm also in favor of deprecating the old supplemental file and integrate what it currently contains directly within the data of each relevant child locale: this will be clearer, and immediately usable by all CLDR-using applications and libraries, with also less maintenance (which is complex to do in a separate global files containining long lits of codes (possibly forgetting some, not coherent with BCP 47 fallback mechanisms, and in fact unnecessarily long to process when applications just need ONE parent locale which is specific to each locale, without processing **all** the supplemental data file to locate if a locale has some parent). The current format also easily allows specifying the same child language multiple times with different parent selectors. This is bad because this should never occur (even if you have some quality check tool to detect such situation). All applications will need to process this supplemental file to reverse the mappings that are listed. This is non-sense. 2018-03-15 21:35 GMT+01:00 George S. via CLDR-Users : > On 3/14/2018 5:04 PM, Steven R. Loomis via CLDR-Users wrote: > > You're quoting from https://www.unicode.org/reports/tr35/tr35.html#Parent_ > Locales > > > ? ? > > > There are many things in the locale files that are not strictly > localizable. Here's an example: > > > > "narrow" here is a distinguishing attribute ( see > https://www.unicode.org/reports/tr35/tr35.html#Definitions ) and is part > of the identity of the element content that follows. > > I think the point of the quote is that the "parent locale" is structural > and not part of the identity of the specific xml file. > > > I can think of few things more structural than where does this locale's > defaults originate from. Without that identity, the child file's definition > is incomplete. Placing the relationship data in another file in a different > directory entirely requires novices like myself to do a tremendous amount > of research to understand what's going on. Even though the maintainers may > have had really excellent reasons for this structure, from the developer > standpoint it's not sensible. > > If you look at the parent locales in supplemental, they are organized from > the point of view of the parent, for setting "which locales inherit from > en-150?" > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Fri Mar 16 08:31:39 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Fri, 16 Mar 2018 14:31:39 +0100 (CET) Subject: TR 35-7 In-Reply-To: References: <1641551864.15423.1518543848491.JavaMail.www@wwinf1n27> <1477667603.16552.1518546148339.JavaMail.www@wwinf1n27> Message-ID: <85233106.10914.1521207099614.JavaMail.www@wwinf1m17> On 13/02/18 19:42 Steven R. Loomis wrote > > No problem! Just want to make sure your work goes the right place.? > > S > > El El mar, feb. 13, 2018 a las 10:22 a. m., Marcel Schneider escribi?: > > OK, will do. Sorry. > > Regards, > > Marcel > > On 13/02/18 19:16, Steven R. Loomis wrote: > > > > Kindly use the bug tracker and/or mailing lists to discuss. I don?t have anything to do in response to your mail otherwise.? > > > > > El El mar, feb. 13, 2018 a las 9:44 a. m., Marcel Schneider? escribi?: > > > > I?d have left it all as-is, but can?t help editing. Now even > > > > http://charupdate.info/unicode/revision/tr35/33-50/tr35-keyboards.html#Principles_for_Keyboard_IDs > > > > Now I must definitely hurry up completing another task I?ve been deprioritizing for a long time. > > > > Regards, > > > > Marcel > > > Thank you. Given the front?end issues at stake, I thought they would preferably be fixed without lots of noise, though. Do you prefer this mailing list or your bug tracking tool? An echo is already on ticket #10901: https://unicode.org/cldr/trac/ticket/10901#comment:20 Best regards, Marcel From cldr-users at unicode.org Fri Mar 16 09:03:29 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Fri, 16 Mar 2018 15:03:29 +0100 (CET) Subject: Do quality standards apply to CLDR? (was: Re: en_GB.xml Gregorian Date Formats) In-Reply-To: <747f7be1-4383-9b49-d8a5-c9b1c55d3349@mhsoftware.com> References: <983effcc-0339-2ee8-ed98-2dc096a6a485@mhsoftware.com> <0800D633-FAB0-486A-8A00-A63A7E04E752@unicode.org> <4d564541-3ad7-2f8e-ed38-b10828a3e0ff@mhsoftware.com> <747f7be1-4383-9b49-d8a5-c9b1c55d3349@mhsoftware.com> Message-ID: <166584932.11532.1521209009655.JavaMail.www@wwinf1m17> On 14/03/18 22:55 George S. via CLDR-Users wrote: [?] > But it's your guys' project. Legally yes, it's a private initiative. But since Unicode is sort of ?the only game in town,? everybody on earth is committed to get this one to work correctly. Factual accuracy, logical correctness, internal consistency, and a somewhat civilized layout are qualities that one may reasonably expect, at least after there was sufficient feedback to make things easy. Best regards, Marcel From cldr-users at unicode.org Fri Mar 16 11:16:40 2018 From: cldr-users at unicode.org (Steven R. Loomis via CLDR-Users) Date: Fri, 16 Mar 2018 09:16:40 -0700 Subject: TR 35-7 In-Reply-To: <85233106.10914.1521207099614.JavaMail.www@wwinf1m17> References: <1641551864.15423.1518543848491.JavaMail.www@wwinf1n27> <1477667603.16552.1518546148339.JavaMail.www@wwinf1n27> <85233106.10914.1521207099614.JavaMail.www@wwinf1m17> Message-ID: Marcel, I think the bug tracking tool is best for tracking issues. You are already interacting with the editor there (I am not the editor). Steven On Fri, Mar 16, 2018 at 6:31 AM, Marcel Schneider wrote: > On 13/02/18 19:42 Steven R. Loomis wrote > > > > No problem! Just want to make sure your work goes the right place. > > > > S > > > > El El mar, feb. 13, 2018 a las 10:22 a. m., Marcel Schneider escribi?: > > > > OK, will do. Sorry. > > > > Regards, > > > > Marcel > > > > On 13/02/18 19:16, Steven R. Loomis wrote: > > > > > > Kindly use the bug tracker and/or mailing lists to discuss. I don?t > have anything to do in response to your mail otherwise. > > > > > > > > > El El mar, feb. 13, 2018 a las 9:44 a. m., Marcel Schneider escribi?: > > > > > > I?d have left it all as-is, but can?t help editing. Now even > > > > > > http://charupdate.info/unicode/revision/tr35/33-50/ > tr35-keyboards.html#Principles_for_Keyboard_IDs > > > > > > Now I must definitely hurry up completing another task I?ve been > deprioritizing for a long time. > > > > > > Regards, > > > > > > Marcel > > > > > > > Thank you. > Given the front?end issues at stake, I thought they would preferably be > fixed without lots of noise, though. > > Do you prefer this mailing list or your bug tracking tool? An echo is > already on ticket #10901: > https://unicode.org/cldr/trac/ticket/10901#comment:20 > > Best regards, > > Marcel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Fri Mar 16 17:13:32 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Fri, 16 Mar 2018 23:13:32 +0100 (CET) Subject: TR 35-7 In-Reply-To: References: <1641551864.15423.1518543848491.JavaMail.www@wwinf1n27> <1477667603.16552.1518546148339.JavaMail.www@wwinf1n27> <85233106.10914.1521207099614.JavaMail.www@wwinf1m17> Message-ID: <494700634.21248.1521238412609.JavaMail.www@wwinf1m17> On 16/03/18 17:16, Steven R. Loomis wrote: > I think the bug tracking tool is best for tracking issues. You are already > interacting with the editor there (I am not the editor). On the bug tickets I?ve been granted to interact with Kristi and Mark, whereas the header of TR #35-7 bears your full name in the ?Editors? field along with ?other CLDR committee members.? This link?s target however doesn?t credit you for anything related to keyboards. Then I?d suggest that you request your name to be replaced with the real editor?s name(s). In turn I apologize for invoking your responsibility. That now brings the need to file a whole bunch of tickets on a per?issue basis. I think it?s unresponsive to get other people wast so much time, for things that would have been fixed long ago if only the initial writers had been a bit careful. The actual system of change logging doesn?t facilitate corrections, as you?re likely to put ?added plus signs in part of the modifier combos? and ?permutated suffix Ls and Rs to prefix Ls and Rs? and the like in the change log. No need to record ?added cell padding? and ?added sample code formatting? however. Why not a default 4px cellpadding to all table th/td elements? Just wondering while I was on it. Best regards, Marcel From cldr-users at unicode.org Fri Mar 16 18:00:04 2018 From: cldr-users at unicode.org (Steven R. Loomis via CLDR-Users) Date: Fri, 16 Mar 2018 23:00:04 +0000 Subject: TR 35-7 In-Reply-To: <494700634.21248.1521238412609.JavaMail.www@wwinf1m17> References: <1641551864.15423.1518543848491.JavaMail.www@wwinf1n27> <1477667603.16552.1518546148339.JavaMail.www@wwinf1n27> <85233106.10914.1521207099614.JavaMail.www@wwinf1m17> <494700634.21248.1521238412609.JavaMail.www@wwinf1m17> Message-ID: El El vie, mar. 16, 2018 a las 3:13 p. m., Marcel Schneider < charupdate at orange.fr> escribi?: > On 16/03/18 17:16, Steven R. Loomis wrote: > > I think the bug tracking tool is best for tracking issues. You are > already > > interacting with the editor there (I am not the editor). I meant the owner of the ticket, not the editor of the document. What I meant is that you have left feedback on https://unicode.org/cldr/trac/ticket/10901#comment:20 - so I think discussion of that feedback is best kept on that ticket. > > Then I?d suggest that you request your name to be replaced with the > real editor?s name(s). > > In turn I apologize for invoking your responsibility. No need for any replacement or apology. > > > That now brings the need to file a whole bunch of tickets on a per?issue > basis. I think it?s unresponsive to get other people wast so much time, > for things that would have been fixed long ago if only the initial writers > had been a bit careful. I don?t see how there is a need to file more tickets, if your comments are already captured. What am I missing? The actual system of change logging doesn?t facilitate corrections, as > you?re likely to put ?added plus signs in part of the modifier combos? > and ?permutated suffix Ls and Rs to prefix Ls and Rs? and the like in > the change log. > > No need to record ?added cell padding? and ?added sample code formatting? > however. Which changes are you referring to? Why not a default 4px cellpadding to all table th/td elements? > > Just wondering while I was on it. > It might be a good suggestion, but that would be a separate ticket. Regards, Steven -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Fri Mar 16 18:11:09 2018 From: cldr-users at unicode.org (Steven R. Loomis via CLDR-Users) Date: Fri, 16 Mar 2018 23:11:09 +0000 Subject: en_GB.xml Gregorian Date Formats In-Reply-To: <8027d2d1-6c96-b683-3cda-8d835e8164d2@mhsoftware.com> References: <983effcc-0339-2ee8-ed98-2dc096a6a485@mhsoftware.com> <0800D633-FAB0-486A-8A00-A63A7E04E752@unicode.org> <4d564541-3ad7-2f8e-ed38-b10828a3e0ff@mhsoftware.com> <747f7be1-4383-9b49-d8a5-c9b1c55d3349@mhsoftware.com> <8027d2d1-6c96-b683-3cda-8d835e8164d2@mhsoftware.com> Message-ID: El El jue, mar. 15, 2018 a las 1:35 p. m., George S. via CLDR-Users < cldr-users at unicode.org> escribi?: > On 3/14/2018 5:04 PM, Steven R. Loomis via CLDR-Users wrote: > > You're quoting from > https://www.unicode.org/reports/tr35/tr35.html#Parent_Locales > > ... > I think the point of the quote is that the "parent locale" is structural > and not part of the identity of the specific xml file. > > > I can think of few things more structural than where does this locale's > defaults originate from. Without that identity, the child file's definition > is incomplete. Placing the relationship data in another file in a different > directory entirely requires novices like myself to do a tremendous amount > of research to understand what's going on. Even though the maintainers may > have had really excellent reasons for this structure, from the developer > standpoint it's not sensible. > The design goal has always been for simplifying maintainance of the repository, and stability for the consumers (including yourself) over ease or simplicity of implementation. That said, perhaps a *comment* could be generated in this specific case, based on the parentLocale data, when the CLDRFile is written out. If this is a common concern, a comment might make it easier for readers. > > If you look at the parent locales in supplemental, they are organized from > the point of view of the parent, for setting "which locales inherit from > en-150?" > > > As a developer, that uses LDML files, I can absolutely guarantee you that > I will NEVER ask that question. en-150 is a synthetic thing you folks > created to organize data. It's just not related to my workflow. en-GB is my > workflow. As a maintainer of the data, perhaps that's useful for you. > This exactly - it is designed for maintaining the data, and it is a lot of data. So how can we improve things, given these different design goals? And anyway, en-GB.xml shouldn?t be your workflow, it should be the fully resolved contents plus all metadata. We have tools to generate a fully resolved XML file, would that be of interest? > > > Parsing the supplementalData is critical to processing CLDR data. > > > That's true because of design decisions that were made by the maintainers. > It's not the only possible solution, and I don't think from the consumer > (developer) standpoint, it's even remotely the best. But, these are my > opinions and are generally uninformed and only accurate from my point of > view. > JSON data for example is pre-resolved. Perhaps there should be another layer in between the maintenance and the consumption of the data? CLDR has various converters to different formats. I don?t see how changing the inheritance structure would be a net benefit- it would break all existing consumers of the data (of which there are many). Surely there is a better solution? The CLDR Java tooling is available in the source repository, it could be a > source of comparison for file handling. > > Steven > > > > > > > > > > > > > On Wed, Mar 14, 2018 at 2:47 PM, George S. via CLDR-Users < > cldr-users at unicode.org> wrote: > >> Personally, I find the reasoning to be circular: >> >> "Since parentLocale information is not localizable on a per locale basis, >> the parentLocale information is contained in CLDR?s supplemental data." >> >> There are many things in the locale files that are not strictly >> localizable. Here's an example: >> >> >> >> saying you're not going to put the parent locale in because it's not >> localizable is kind of silly when you have lot's of data in the file that's >> not localizable. >> >> I'm suggesting: >> >> >> >> >> >> >> >> But it's your guys' project. >> >> >> On 3/14/2018 3:26 PM, Steven R. Loomis wrote: >> >> George, >> > It would be nice if the en_GB.xml file referenced it's parent >> >> I appreciate the idea, however, the XML files are not designed to >> be looked at in isolation. That's why we put this notice at the top: >> >> " CLDR data files are interpreted according to the LDML specification ( >> http://unicode.org/reports/tr35/) " >> >> Is there a better way to word this? >> >> Please also see the Implementer's guide and FAQ at >> https://github.com/unicode-org/cldr-implementers-guide/ - if you think >> this would be a good FAQ can you open an issue, or better yet a pull >> request there? >> >> >> On Wed, Mar 14, 2018 at 1:57 PM, George S. via CLDR-Users < >> cldr-users at unicode.org> wrote: >> >>> Thanks for responding. I knew I'd gone down this road before. Drat. >>> >>> I'll make the same comment I made three years ago: >>> >>> It would be nice if the en_GB.xml file referenced it's parent so that >>> mortals might have some idea of where to look. Having the relationship >>> squirreled away in a file in another directory with a non-obvious name >>> isn't very handy. >>> >>> >>> >>> On 3/14/2018 2:38 PM, Peter Edberg wrote: >>> >>> en_GB inherits from en_001, not from en. >>> >>> - Peter E >>> >>> On Mar 14, 2018, at 12:48 PM, George S. via CLDR-Users < >>> cldr-users at unicode.org> wrote: >>> >>> I'm looking at the file comm/main/en_GB.xml and I'm really confused. I'm >>> looking at the Gregorian calendar section, and there's no >>> >>> dateFormats / dateFormatLength=short >>> >>> the value in en.xml is >>> >>> M/d/yy >>> >>> If I look at en_AU.xml there is an entry with a value of "d/M/yy". >>> >>> Similarly, en_IE.xml there is no short dateFormatLength value. >>> >>> Can anyone help me understand how this all works? I'm using a library >>> that generates it's localization files from LDML, and it's coming up with a >>> lot of wrong answers. Before I go to them, I'd like to understand why >>> things are formatted in this way. >>> >>> >>> -- >>> George S. >>> *MH Software, Inc.* >>> Voice: 303 438 9585 <%28303%29%20438-9585> >>> http://www.mhsoftware.com >>> _______________________________________________ >>> CLDR-Users mailing list >>> CLDR-Users at unicode.org >>> http://unicode.org/mailman/listinfo/cldr-users >>> >>> >>> >>> -- >>> George S. >>> *MH Software, Inc.* >>> Voice: 303 438 9585 <%28303%29%20438-9585> >>> http://www.mhsoftware.com >>> >>> _______________________________________________ >>> CLDR-Users mailing list >>> CLDR-Users at unicode.org >>> http://unicode.org/mailman/listinfo/cldr-users >>> >>> >> >> -- >> George S. >> *MH Software, Inc.* >> Voice: 303 438 9585 <%28303%29%20438-9585> >> http://www.mhsoftware.com >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> >> > > > _______________________________________________ > CLDR-Users mailing listCLDR-Users at unicode.orghttp://unicode.org/mailman/listinfo/cldr-users > > > -- > George S. > *MH Software, Inc.* > Voice: 303 438 9585 > http://www.mhsoftware.com > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Fri Mar 16 18:14:16 2018 From: cldr-users at unicode.org (Steven R. Loomis via CLDR-Users) Date: Fri, 16 Mar 2018 23:14:16 +0000 Subject: en_GB.xml Gregorian Date Formats In-Reply-To: References: <983effcc-0339-2ee8-ed98-2dc096a6a485@mhsoftware.com> <0800D633-FAB0-486A-8A00-A63A7E04E752@unicode.org> <4d564541-3ad7-2f8e-ed38-b10828a3e0ff@mhsoftware.com> <747f7be1-4383-9b49-d8a5-c9b1c55d3349@mhsoftware.com> <8027d2d1-6c96-b683-3cda-8d835e8164d2@mhsoftware.com> Message-ID: El El jue, mar. 15, 2018 a las 2:33 p. m., Philippe Verdy via CLDR-Users < cldr-users at unicode.org> escribi?: > > So I'm also in favor of deprecating the old supplemental file and > integrate what it currently contains directly within the data of each > relevant child locale: this will be clearer, and immediately usable by all > CLDR-using applications and libraries, with also less maintenance (which is > complex to do in a separate global files containining long lits of codes > (possibly forgetting some, not coherent with BCP 47 fallback mechanisms, > and in fact unnecessarily long to process when applications just need ONE > parent locale which is specific to each locale, without processing **all** > the supplemental data file to locate if a locale has some parent). > This means duplicating data, which makes maintenance more complicated for no real benefit, as well as breaking existing consumers. As I wrote, there are tools which already generate fully resolved locale data with all the inheritance filled in. Perhaps that would be of more interest in consumption than the unresolved source data. > 2018-03-15 21:35 GMT+01:00 George S. via CLDR-Users < > cldr-users at unicode.org>: > >> On 3/14/2018 5:04 PM, Steven R. Loomis via CLDR-Users wrote: >> >> You're quoting from >> https://www.unicode.org/reports/tr35/tr35.html#Parent_Locales >> >> > ? ? >> >> > There are many things in the locale files that are not strictly >> localizable. Here's an example: >> > >> >> "narrow" here is a distinguishing attribute ( see >> https://www.unicode.org/reports/tr35/tr35.html#Definitions ) and is part >> of the identity of the element content that follows. >> >> I think the point of the quote is that the "parent locale" is structural >> and not part of the identity of the specific xml file. >> >> >> I can think of few things more structural than where does this locale's >> defaults originate from. Without that identity, the child file's definition >> is incomplete. Placing the relationship data in another file in a different >> directory entirely requires novices like myself to do a tremendous amount >> of research to understand what's going on. Even though the maintainers may >> have had really excellent reasons for this structure, from the developer >> standpoint it's not sensible. >> >> If you look at the parent locales in supplemental, they are organized >> from the point of view of the parent, for setting "which locales inherit >> from en-150?" >> >> _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Fri Mar 16 18:19:45 2018 From: cldr-users at unicode.org (George S. via CLDR-Users) Date: Fri, 16 Mar 2018 17:19:45 -0600 Subject: en_GB.xml Gregorian Date Formats In-Reply-To: References: <983effcc-0339-2ee8-ed98-2dc096a6a485@mhsoftware.com> <0800D633-FAB0-486A-8A00-A63A7E04E752@unicode.org> <4d564541-3ad7-2f8e-ed38-b10828a3e0ff@mhsoftware.com> <747f7be1-4383-9b49-d8a5-c9b1c55d3349@mhsoftware.com> <8027d2d1-6c96-b683-3cda-8d835e8164d2@mhsoftware.com> Message-ID: <6eb83cb4-0a01-51e4-0fa9-2a8c5d66d4c4@mhsoftware.com> On 3/16/2018 5:11 PM, Steven R. Loomis wrote: > > This exactly - it is designed for maintaining the data, and it is a > lot of data.? So how can we improve things, given these different > design goals? > ? And anyway, en-GB.xml shouldn?t be your workflow, it should be the > fully resolved contents plus all metadata. ? We have tools to generate > a fully resolved XML file, would that be of interest? > If you could direct me to those tools I would really appreciate that. If I could get fully resolved files, that would meet my needs well. -- George S. *MH Software, Inc.* Voice: 303 438 9585 http://www.mhsoftware.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Fri Mar 16 18:53:44 2018 From: cldr-users at unicode.org (Steven R. Loomis via CLDR-Users) Date: Fri, 16 Mar 2018 16:53:44 -0700 Subject: en_GB.xml Gregorian Date Formats In-Reply-To: <6eb83cb4-0a01-51e4-0fa9-2a8c5d66d4c4@mhsoftware.com> References: <983effcc-0339-2ee8-ed98-2dc096a6a485@mhsoftware.com> <0800D633-FAB0-486A-8A00-A63A7E04E752@unicode.org> <4d564541-3ad7-2f8e-ed38-b10828a3e0ff@mhsoftware.com> <747f7be1-4383-9b49-d8a5-c9b1c55d3349@mhsoftware.com> <8027d2d1-6c96-b683-3cda-8d835e8164d2@mhsoftware.com> <6eb83cb4-0a01-51e4-0fa9-2a8c5d66d4c4@mhsoftware.com> Message-ID: George, I put some instructions at https://gist.github.com/srl295/3de87c339aac467e5ee506e01855177d#file-howto-sh and ran a copy. just for testing. but you should be able to build cldr.jar and run this against any version of CLDR data you want to fully resolve. On Fri, Mar 16, 2018 at 4:19 PM, George S. via CLDR-Users < cldr-users at unicode.org> wrote: > > > On 3/16/2018 5:11 PM, Steven R. Loomis wrote: > > > This exactly - it is designed for maintaining the data, and it is a lot of > data. So how can we improve things, given these different design goals? > And anyway, en-GB.xml shouldn?t be your workflow, it should be the fully > resolved contents plus all metadata. We have tools to generate a fully > resolved XML file, would that be of interest? > > > If you could direct me to those tools I would really appreciate that. > > If I could get fully resolved files, that would meet my needs well. > > > -- > George S. > *MH Software, Inc.* > Voice: 303 438 9585 <(303)%20438-9585> > http://www.mhsoftware.com > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Fri Mar 16 21:22:09 2018 From: cldr-users at unicode.org (Philippe Verdy via CLDR-Users) Date: Sat, 17 Mar 2018 03:22:09 +0100 Subject: en_GB.xml Gregorian Date Formats In-Reply-To: References: <983effcc-0339-2ee8-ed98-2dc096a6a485@mhsoftware.com> <0800D633-FAB0-486A-8A00-A63A7E04E752@unicode.org> <4d564541-3ad7-2f8e-ed38-b10828a3e0ff@mhsoftware.com> <747f7be1-4383-9b49-d8a5-c9b1c55d3349@mhsoftware.com> <8027d2d1-6c96-b683-3cda-8d835e8164d2@mhsoftware.com> Message-ID: No duplication at all: in one case there's a supplemenal file and we always need infer reverse data. You can do the opposite: put this data directly in the per-locale files, and then infer the supplemental file by generating it only for compatiblity, but most tools will never need that supplemental file which will just be informational. You'll avoid also a source of errors in the existing file (e.g. missing codes in the lists, duplicate codes assigned to the same parent or to different parents). It will just be simpler to validate each locale separately without editing a long line of codes in the supplemental file. If new locales are added by teams, no need to synchronize your work a team can update one locale and another update and validate another one. No one needs to touch this supplemtal file which will just be automatically infered after collecting the dataset for all locales to publish. But tools will no longer need this file (they will not need it at all if the parent locale is already found and specified in per-locale files). No need to parse all the content of the supplemental file (which is compeltely unusable without parsing it completely and reversing the mapping). This supplkemetnal file does not work like normal BCP47 fallback resolution mechanism, it works in the incorrect direction. So yes : deprecate it, make it only informational but no longer required. Inform that in some future future versions (e.g. 5 years after notice) it will be completely removed. No applicationat all really need it! 2018-03-17 0:14 GMT+01:00 Steven R. Loomis : > > El El jue, mar. 15, 2018 a las 2:33 p. m., Philippe Verdy via CLDR-Users < > cldr-users at unicode.org> escribi?: > >> >> So I'm also in favor of deprecating the old supplemental file and >> integrate what it currently contains directly within the data of each >> relevant child locale: this will be clearer, and immediately usable by all >> CLDR-using applications and libraries, with also less maintenance (which is >> complex to do in a separate global files containining long lits of codes >> (possibly forgetting some, not coherent with BCP 47 fallback mechanisms, >> and in fact unnecessarily long to process when applications just need ONE >> parent locale which is specific to each locale, without processing **all** >> the supplemental data file to locate if a locale has some parent). >> > > This means duplicating data, which makes maintenance more complicated for > no real benefit, as well as breaking existing consumers. As I wrote, there > are tools which already generate fully resolved locale data with all the > inheritance filled in. Perhaps that would be of more interest in > consumption than the unresolved source data. > > > >> 2018-03-15 21:35 GMT+01:00 George S. via CLDR-Users < >> cldr-users at unicode.org>: >> >>> On 3/14/2018 5:04 PM, Steven R. Loomis via CLDR-Users wrote: >>> >>> You're quoting from https://www.unicode.org/ >>> reports/tr35/tr35.html#Parent_Locales >>> >>> > ? ? >>> >>> > There are many things in the locale files that are not strictly >>> localizable. Here's an example: >>> > >>> >>> "narrow" here is a distinguishing attribute ( see >>> https://www.unicode.org/reports/tr35/tr35.html#Definitions ) and is >>> part of the identity of the element content that follows. >>> >>> I think the point of the quote is that the "parent locale" is structural >>> and not part of the identity of the specific xml file. >>> >>> >>> I can think of few things more structural than where does this locale's >>> defaults originate from. Without that identity, the child file's definition >>> is incomplete. Placing the relationship data in another file in a different >>> directory entirely requires novices like myself to do a tremendous amount >>> of research to understand what's going on. Even though the maintainers may >>> have had really excellent reasons for this structure, from the developer >>> standpoint it's not sensible. >>> >>> If you look at the parent locales in supplemental, they are organized >>> from the point of view of the parent, for setting "which locales inherit >>> from en-150?" >>> >>> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Fri Mar 16 21:44:51 2018 From: cldr-users at unicode.org (Martin Hosken via CLDR-Users) Date: Sat, 17 Mar 2018 09:44:51 +0700 Subject: en_GB.xml Gregorian Date Formats In-Reply-To: References: <983effcc-0339-2ee8-ed98-2dc096a6a485@mhsoftware.com> <0800D633-FAB0-486A-8A00-A63A7E04E752@unicode.org> <4d564541-3ad7-2f8e-ed38-b10828a3e0ff@mhsoftware.com> <747f7be1-4383-9b49-d8a5-c9b1c55d3349@mhsoftware.com> <8027d2d1-6c96-b683-3cda-8d835e8164d2@mhsoftware.com> Message-ID: <20180317094451.2f5d80fc@sil-mh8> Dear All, I would like to add my support to this proposal of moving the parentLocal information back into the ldml file. When dealing with adding a new locale, it is very helpful if everything about that locale is stored in the one file and so can be edited independently and submitted as a unit rather than having to change other files in order to add a new locale. In my analysis and limited experience of working with LDML I would suggest that there are two areas where we have overlaid a solution using our namespace: 1. Parent locale It may be convenient from an overall database perspective to have the parent child relationships stored as supplemental data, but I think this was a retrograde step and would love to see the element reinstated. Adding a new locale then is simply a process of adding a new file rather than adding a file and editing the supplemental data and merging and managing that. Supplemental data should be supplemental not required for the interpretation / flattening of an LDML file. I am thinking that perhaps the parentLocale was pulled out of the ldml file and into supplementalData because it was thought to be useful in language tag processing. I'm not sure that it is and as mentioned is more trouble than it is worth being there rather than back in the ldml file. 2. Language names When adding a new locale, very often the language name or more likely the variant for that language, needs to be given for key languages like English and French. Changing these core locales is not something we want to encourage those creating emerging locales to be involved in. So instead we have added the ability to specify those strings for just the locale itself, in the locale itself. I realise that this is doing the opposite of what I said in the previous section. But it allows the new locale to be self contained. At the point the locale gets integrated, then it is a simple matter to move the information out of the new locale into the other locales where it really belongs. So this is an interim solution that again works to keep a single LDML file editable as a unit rather than seeing changes to a locale as edits to a large database of files. I realise that philosophically to the CLDR technical team, the CLDR is just one big database and that it is the integrity and management of that database as a whole that is key. But for many language groups, their view of that database is their LDML file and they would like to have full control over information from that one file rather than needing to be given write access to globally shared files. Yours, Martin > No duplication at all: in one case there's a supplemenal file and we always > need infer reverse data. > You can do the opposite: put this data directly in the per-locale files, > and then infer the supplemental file by generating it only for > compatiblity, but most tools will never need that supplemental file which > will just be informational. > You'll avoid also a source of errors in the existing file (e.g. missing > codes in the lists, duplicate codes assigned to the same parent or to > different parents). > > It will just be simpler to validate each locale separately without editing > a long line of codes in the supplemental file. If new locales are added by > teams, no need to synchronize your work a team can update one locale and > another update and validate another one. No one needs to touch this > supplemtal file which will just be automatically infered after collecting > the dataset for all locales to publish. > But tools will no longer need this file (they will not need it at all if > the parent locale is already found and specified in per-locale files). > > No need to parse all the content of the supplemental file (which is > compeltely unusable without parsing it completely and reversing the > mapping). This supplkemetnal file does not work like normal BCP47 fallback > resolution mechanism, it works in the incorrect direction. > > So yes : deprecate it, make it only informational but no longer required. > Inform that in some future future versions (e.g. 5 years after notice) it > will be completely removed. No applicationat all really need it! > > > 2018-03-17 0:14 GMT+01:00 Steven R. Loomis : > > > > > El El jue, mar. 15, 2018 a las 2:33 p. m., Philippe Verdy via CLDR-Users < > > cldr-users at unicode.org> escribi?: > > > >> > >> So I'm also in favor of deprecating the old supplemental file and > >> integrate what it currently contains directly within the data of each > >> relevant child locale: this will be clearer, and immediately usable by all > >> CLDR-using applications and libraries, with also less maintenance (which is > >> complex to do in a separate global files containining long lits of codes > >> (possibly forgetting some, not coherent with BCP 47 fallback mechanisms, > >> and in fact unnecessarily long to process when applications just need ONE > >> parent locale which is specific to each locale, without processing **all** > >> the supplemental data file to locate if a locale has some parent). > >> > > > > This means duplicating data, which makes maintenance more complicated for > > no real benefit, as well as breaking existing consumers. As I wrote, there > > are tools which already generate fully resolved locale data with all the > > inheritance filled in. Perhaps that would be of more interest in > > consumption than the unresolved source data. > > > > > > > >> 2018-03-15 21:35 GMT+01:00 George S. via CLDR-Users < > >> cldr-users at unicode.org>: > >> > >>> On 3/14/2018 5:04 PM, Steven R. Loomis via CLDR-Users wrote: > >>> > >>> You're quoting from https://www.unicode.org/ > >>> reports/tr35/tr35.html#Parent_Locales > >>> > >>> > ? ? > >>> > >>> > There are many things in the locale files that are not strictly > >>> localizable. Here's an example: > >>> > > >>> > >>> "narrow" here is a distinguishing attribute ( see > >>> https://www.unicode.org/reports/tr35/tr35.html#Definitions ) and is > >>> part of the identity of the element content that follows. > >>> > >>> I think the point of the quote is that the "parent locale" is structural > >>> and not part of the identity of the specific xml file. > >>> > >>> > >>> I can think of few things more structural than where does this locale's > >>> defaults originate from. Without that identity, the child file's definition > >>> is incomplete. Placing the relationship data in another file in a different > >>> directory entirely requires novices like myself to do a tremendous amount > >>> of research to understand what's going on. Even though the maintainers may > >>> have had really excellent reasons for this structure, from the developer > >>> standpoint it's not sensible. > >>> > >>> If you look at the parent locales in supplemental, they are organized > >>> from the point of view of the parent, for setting "which locales inherit > >>> from en-150?" > >>> > >>> _______________________________________________ > >> CLDR-Users mailing list > >> CLDR-Users at unicode.org > >> http://unicode.org/mailman/listinfo/cldr-users > >> > > From cldr-users at unicode.org Fri Mar 16 22:16:30 2018 From: cldr-users at unicode.org (Hugh Paterson via CLDR-Users) Date: Fri, 16 Mar 2018 20:16:30 -0700 Subject: en_GB.xml Gregorian Date Formats In-Reply-To: <20180317094451.2f5d80fc@sil-mh8> References: <983effcc-0339-2ee8-ed98-2dc096a6a485@mhsoftware.com> <0800D633-FAB0-486A-8A00-A63A7E04E752@unicode.org> <4d564541-3ad7-2f8e-ed38-b10828a3e0ff@mhsoftware.com> <747f7be1-4383-9b49-d8a5-c9b1c55d3349@mhsoftware.com> <8027d2d1-6c96-b683-3cda-8d835e8164d2@mhsoftware.com> <20180317094451.2f5d80fc@sil-mh8> Message-ID: Martin, Re#2 Would jenkins or Travis CI be able to take the language name from the newly submitted locale file and create a "pull request" with the necessary data to be added to the English or French locale file? - Hugh Paterson III On Fri, Mar 16, 2018 at 7:44 PM, Martin Hosken via CLDR-Users < cldr-users at unicode.org> wrote: > Dear All, > > I would like to add my support to this proposal of moving the parentLocal > information back into the ldml file. When dealing with adding a new locale, > it is very helpful if everything about that locale is stored in the one > file and so can be edited independently and submitted as a unit rather than > having to change other files in order to add a new locale. In my analysis > and limited experience of working with LDML I would suggest that there are > two areas where we have overlaid a solution using our namespace: > > 1. Parent locale > > It may be convenient from an overall database perspective to have the > parent child relationships stored as supplemental data, but I think this > was a retrograde step and would love to see the element > reinstated. Adding a new locale then is simply a process of adding a new > file rather than adding a file and editing the supplemental data and > merging and managing that. Supplemental data should be supplemental not > required for the interpretation / flattening of an LDML file. > > I am thinking that perhaps the parentLocale was pulled out of the ldml > file and into supplementalData because it was thought to be useful in > language tag processing. I'm not sure that it is and as mentioned is more > trouble than it is worth being there rather than back in the ldml file. > > 2. Language names > > When adding a new locale, very often the language name or more likely the > variant for that language, needs to be given for key languages like English > and French. Changing these core locales is not something we want to > encourage those creating emerging locales to be involved in. So instead we > have added the ability to specify those strings for just the locale itself, > in the locale itself. I realise that this is doing the opposite of what I > said in the previous section. But it allows the new locale to be self > contained. At the point the locale gets integrated, then it is a simple > matter to move the information out of the new locale into the other locales > where it really belongs. So this is an interim solution that again works to > keep a single LDML file editable as a unit rather than seeing changes to a > locale as edits to a large database of files. > > I realise that philosophically to the CLDR technical team, the CLDR is > just one big database and that it is the integrity and management of that > database as a whole that is key. But for many language groups, their view > of that database is their LDML file and they would like to have full > control over information from that one file rather than needing to be given > write access to globally shared files. > > Yours, > Martin > > > > > No duplication at all: in one case there's a supplemenal file and we > always > > need infer reverse data. > > You can do the opposite: put this data directly in the per-locale files, > > and then infer the supplemental file by generating it only for > > compatiblity, but most tools will never need that supplemental file which > > will just be informational. > > You'll avoid also a source of errors in the existing file (e.g. missing > > codes in the lists, duplicate codes assigned to the same parent or to > > different parents). > > > > It will just be simpler to validate each locale separately without > editing > > a long line of codes in the supplemental file. If new locales are added > by > > teams, no need to synchronize your work a team can update one locale and > > another update and validate another one. No one needs to touch this > > supplemtal file which will just be automatically infered after collecting > > the dataset for all locales to publish. > > But tools will no longer need this file (they will not need it at all if > > the parent locale is already found and specified in per-locale files). > > > > No need to parse all the content of the supplemental file (which is > > compeltely unusable without parsing it completely and reversing the > > mapping). This supplkemetnal file does not work like normal BCP47 > fallback > > resolution mechanism, it works in the incorrect direction. > > > > So yes : deprecate it, make it only informational but no longer required. > > Inform that in some future future versions (e.g. 5 years after notice) it > > will be completely removed. No applicationat all really need it! > > > > > > 2018-03-17 0:14 GMT+01:00 Steven R. Loomis : > > > > > > > > El El jue, mar. 15, 2018 a las 2:33 p. m., Philippe Verdy via > CLDR-Users < > > > cldr-users at unicode.org> escribi?: > > > > > >> > > >> So I'm also in favor of deprecating the old supplemental file and > > >> integrate what it currently contains directly within the data of each > > >> relevant child locale: this will be clearer, and immediately usable > by all > > >> CLDR-using applications and libraries, with also less maintenance > (which is > > >> complex to do in a separate global files containining long lits of > codes > > >> (possibly forgetting some, not coherent with BCP 47 fallback > mechanisms, > > >> and in fact unnecessarily long to process when applications just need > ONE > > >> parent locale which is specific to each locale, without processing > **all** > > >> the supplemental data file to locate if a locale has some parent). > > >> > > > > > > This means duplicating data, which makes maintenance more complicated > for > > > no real benefit, as well as breaking existing consumers. As I wrote, > there > > > are tools which already generate fully resolved locale data with all > the > > > inheritance filled in. Perhaps that would be of more interest in > > > consumption than the unresolved source data. > > > > > > > > > > > >> 2018-03-15 21:35 GMT+01:00 George S. via CLDR-Users < > > >> cldr-users at unicode.org>: > > >> > > >>> On 3/14/2018 5:04 PM, Steven R. Loomis via CLDR-Users wrote: > > >>> > > >>> You're quoting from https://www.unicode.org/ > > >>> reports/tr35/tr35.html#Parent_Locales > > >>> > > >>> > ? ? > > >>> > > >>> > There are many things in the locale files that are not strictly > > >>> localizable. Here's an example: > > >>> > > > >>> > > >>> "narrow" here is a distinguishing attribute ( see > > >>> https://www.unicode.org/reports/tr35/tr35.html#Definitions ) and is > > >>> part of the identity of the element content that follows. > > >>> > > >>> I think the point of the quote is that the "parent locale" is > structural > > >>> and not part of the identity of the specific xml file. > > >>> > > >>> > > >>> I can think of few things more structural than where does this > locale's > > >>> defaults originate from. Without that identity, the child file's > definition > > >>> is incomplete. Placing the relationship data in another file in a > different > > >>> directory entirely requires novices like myself to do a tremendous > amount > > >>> of research to understand what's going on. Even though the > maintainers may > > >>> have had really excellent reasons for this structure, from the > developer > > >>> standpoint it's not sensible. > > >>> > > >>> If you look at the parent locales in supplemental, they are organized > > >>> from the point of view of the parent, for setting "which locales > inherit > > >>> from en-150?" > > >>> > > >>> _______________________________________________ > > >> CLDR-Users mailing list > > >> CLDR-Users at unicode.org > > >> http://unicode.org/mailman/listinfo/cldr-users > > >> > > > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Fri Mar 16 22:18:23 2018 From: cldr-users at unicode.org (Philippe Verdy via CLDR-Users) Date: Sat, 17 Mar 2018 04:18:23 +0100 Subject: en_GB.xml Gregorian Date Formats In-Reply-To: <20180317094451.2f5d80fc@sil-mh8> References: <983effcc-0339-2ee8-ed98-2dc096a6a485@mhsoftware.com> <0800D633-FAB0-486A-8A00-A63A7E04E752@unicode.org> <4d564541-3ad7-2f8e-ed38-b10828a3e0ff@mhsoftware.com> <747f7be1-4383-9b49-d8a5-c9b1c55d3349@mhsoftware.com> <8027d2d1-6c96-b683-3cda-8d835e8164d2@mhsoftware.com> <20180317094451.2f5d80fc@sil-mh8> Message-ID: Me too. This format causes also a constant chicken-and-egg problem for versioning (let's remember that the way locale data work and are updated, not all locales are changed at the same time, and one may still want to tune one locale without touching the rest, but here the current format makes that each locale depends on this global file which itself depends on all locales: it's impossible to get a coherent view when we just want to update one locale, without having to rebuild or recheck all other locales because of this "stupid" backward dependency (and I don't see why we could not mix locale data files from several version of CLDR: e.g. updating all locales except a few that a project has decided to tune specifically and that werre based on previous versions possibly depending on a different parent/fallback locales: e.g. a local could have initially depended on "root", then was changed later to depend on "en", then later to "cy", then back to "en": fallbacks are subject to change and each project may have preferrences, depending on the intended public their are targetting and the amount of translations/localisations made in specific locales; some newer versions of CLDR data may depend on features still not implemented in their locale libraries, or could contain characters not supported on their rendering). Making parent/fallback locales directly a data for each locale allows much better flexibility, and removes the undesired dependency of all LDML files to be all in the same CDLR dataset version assumed by the single supplemental file. All projects that want to remove this stupid dependency will need to parse this supplemental file once, only to integrate a single "fallback" element in each LDML file, **before** versioning it for the intended project. Then if ever CLDR changes this supplemental file, it will be ignored in locales that have laready been used an integrated, even if they are refreshed: the fallback will not be overwritten from the new supplemental file as it can cause severe problems or it will irritate final users. An in all cases, this single global dependency certainly does not help maintaining CLDR itself to experiment new locales to add or integrate (or possibly remove or put back to "draft", for insufficient level of completeness or because there are too many signaled problems in core elements). Yuo cannot easily create custom "branches" in the project for tests, and then reintegrate the branch later after tests as there's no clear way to decide which fallbacks from different versions of the global file to keep for all other locales! It's just simpler to decide that for a single tested locale. This does not prohibit the CLDR project to create an assembly later containing the generated supplemental file, and compressing datas by eliminating identical data in child locales that are identical to the data inherited either directly from the designated parent/fallback (and then recursively to the grand parent) and then from the standard BCP47 mechanism for fallbacks from the target locale (infered only by the format of BCP47 locale codes), then again recursively on every parent, then grand-parent, until we reach the "root" locale (ignored in all previous steps, but which will be the last locale tested; after that point an application may opt to choose some arbitrary locale such as the website default language, or the OS default localisation, or arbitrarily may choose the language used natively by programmers or in the first non-localized versions of the application). 2018-03-17 3:44 GMT+01:00 Martin Hosken : > Dear All, > > I would like to add my support to this proposal of moving the parentLocal > information back into the ldml file. When dealing with adding a new locale, > it is very helpful if everything about that locale is stored in the one > file and so can be edited independently and submitted as a unit rather than > having to change other files in order to add a new locale. In my analysis > and limited experience of working with LDML I would suggest that there are > two areas where we have overlaid a solution using our namespace: > > 1. Parent locale > > It may be convenient from an overall database perspective to have the > parent child relationships stored as supplemental data, but I think this > was a retrograde step and would love to see the element > reinstated. Adding a new locale then is simply a process of adding a new > file rather than adding a file and editing the supplemental data and > merging and managing that. Supplemental data should be supplemental not > required for the interpretation / flattening of an LDML file. > > I am thinking that perhaps the parentLocale was pulled out of the ldml > file and into supplementalData because it was thought to be useful in > language tag processing. I'm not sure that it is and as mentioned is more > trouble than it is worth being there rather than back in the ldml file. > > 2. Language names > > When adding a new locale, very often the language name or more likely the > variant for that language, needs to be given for key languages like English > and French. Changing these core locales is not something we want to > encourage those creating emerging locales to be involved in. So instead we > have added the ability to specify those strings for just the locale itself, > in the locale itself. I realise that this is doing the opposite of what I > said in the previous section. But it allows the new locale to be self > contained. At the point the locale gets integrated, then it is a simple > matter to move the information out of the new locale into the other locales > where it really belongs. So this is an interim solution that again works to > keep a single LDML file editable as a unit rather than seeing changes to a > locale as edits to a large database of files. > > I realise that philosophically to the CLDR technical team, the CLDR is > just one big database and that it is the integrity and management of that > database as a whole that is key. But for many language groups, their view > of that database is their LDML file and they would like to have full > control over information from that one file rather than needing to be given > write access to globally shared files. > > Yours, > Martin > > > > > No duplication at all: in one case there's a supplemenal file and we > always > > need infer reverse data. > > You can do the opposite: put this data directly in the per-locale files, > > and then infer the supplemental file by generating it only for > > compatiblity, but most tools will never need that supplemental file which > > will just be informational. > > You'll avoid also a source of errors in the existing file (e.g. missing > > codes in the lists, duplicate codes assigned to the same parent or to > > different parents). > > > > It will just be simpler to validate each locale separately without > editing > > a long line of codes in the supplemental file. If new locales are added > by > > teams, no need to synchronize your work a team can update one locale and > > another update and validate another one. No one needs to touch this > > supplemtal file which will just be automatically infered after collecting > > the dataset for all locales to publish. > > But tools will no longer need this file (they will not need it at all if > > the parent locale is already found and specified in per-locale files). > > > > No need to parse all the content of the supplemental file (which is > > compeltely unusable without parsing it completely and reversing the > > mapping). This supplkemetnal file does not work like normal BCP47 > fallback > > resolution mechanism, it works in the incorrect direction. > > > > So yes : deprecate it, make it only informational but no longer required. > > Inform that in some future future versions (e.g. 5 years after notice) it > > will be completely removed. No applicationat all really need it! > > > > > > 2018-03-17 0:14 GMT+01:00 Steven R. Loomis : > > > > > > > > El El jue, mar. 15, 2018 a las 2:33 p. m., Philippe Verdy via > CLDR-Users < > > > cldr-users at unicode.org> escribi?: > > > > > >> > > >> So I'm also in favor of deprecating the old supplemental file and > > >> integrate what it currently contains directly within the data of each > > >> relevant child locale: this will be clearer, and immediately usable > by all > > >> CLDR-using applications and libraries, with also less maintenance > (which is > > >> complex to do in a separate global files containining long lits of > codes > > >> (possibly forgetting some, not coherent with BCP 47 fallback > mechanisms, > > >> and in fact unnecessarily long to process when applications just need > ONE > > >> parent locale which is specific to each locale, without processing > **all** > > >> the supplemental data file to locate if a locale has some parent). > > >> > > > > > > This means duplicating data, which makes maintenance more complicated > for > > > no real benefit, as well as breaking existing consumers. As I wrote, > there > > > are tools which already generate fully resolved locale data with all > the > > > inheritance filled in. Perhaps that would be of more interest in > > > consumption than the unresolved source data. > > > > > > > > > > > >> 2018-03-15 21:35 GMT+01:00 George S. via CLDR-Users < > > >> cldr-users at unicode.org>: > > >> > > >>> On 3/14/2018 5:04 PM, Steven R. Loomis via CLDR-Users wrote: > > >>> > > >>> You're quoting from https://www.unicode.org/ > > >>> reports/tr35/tr35.html#Parent_Locales > > >>> > > >>> > ? ? > > >>> > > >>> > There are many things in the locale files that are not strictly > > >>> localizable. Here's an example: > > >>> > > > >>> > > >>> "narrow" here is a distinguishing attribute ( see > > >>> https://www.unicode.org/reports/tr35/tr35.html#Definitions ) and is > > >>> part of the identity of the element content that follows. > > >>> > > >>> I think the point of the quote is that the "parent locale" is > structural > > >>> and not part of the identity of the specific xml file. > > >>> > > >>> > > >>> I can think of few things more structural than where does this > locale's > > >>> defaults originate from. Without that identity, the child file's > definition > > >>> is incomplete. Placing the relationship data in another file in a > different > > >>> directory entirely requires novices like myself to do a tremendous > amount > > >>> of research to understand what's going on. Even though the > maintainers may > > >>> have had really excellent reasons for this structure, from the > developer > > >>> standpoint it's not sensible. > > >>> > > >>> If you look at the parent locales in supplemental, they are organized > > >>> from the point of view of the parent, for setting "which locales > inherit > > >>> from en-150?" > > >>> > > >>> _______________________________________________ > > >> CLDR-Users mailing list > > >> CLDR-Users at unicode.org > > >> http://unicode.org/mailman/listinfo/cldr-users > > >> > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Sat Mar 17 04:11:05 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Sat, 17 Mar 2018 10:11:05 +0100 (CET) Subject: TR 35-7 In-Reply-To: References: <1641551864.15423.1518543848491.JavaMail.www@wwinf1n27> <1477667603.16552.1518546148339.JavaMail.www@wwinf1n27> <85233106.10914.1521207099614.JavaMail.www@wwinf1m17> <494700634.21248.1521238412609.JavaMail.www@wwinf1m17> Message-ID: <829905661.2557.1521277865828.JavaMail.www@wwinf1m17> On Fri Mar 16 18:00:04 CDT 2018, Steven R. Loomis wrote: > > El El vie, mar. 16, 2018 a las 3:13 p. m., Marcel Schneider < > charupdate at orange.fr> escribi?: > > > On 16/03/18 17:16, Steven R. Loomis wrote: > > > I think the bug tracking tool is best for tracking issues. You are > > > already interacting with the editor there (I am not the editor). > > > I meant the owner of the ticket, not the editor of the document. What I > meant is that you have left feedback on > https://unicode.org/cldr/trac/ticket/10901#comment:20 - so I think > discussion of that feedback is best kept on that ticket. Given that there, the editorial feedback is buried under the more substantial concerns, the owner interacted so far as he replicated the comment he left on all other PRI #367 tickets: https://unicode.org/cldr/trac/ticket/10901#comment:17 | | There was a lot of feedback on this PRI. The keyboard group has made | some modifications based on feedback, but decided to leave other features | for consideration for a future version. That is not much of an interaction. It didn?t help, neither, that when I tried to attach revised HTML source code, the tool rejected it because it contained more than 4 external links. Then I?ve disabled all external links, but ended up simply making it available on the internet, rather than messing around with the file. However, inferring from Mark?s comment and the actual state of TR #35-7, the keyboard group decided not to consider fixing all the small issues that are parseable in: http://charupdate.info/unicode/revision/tr35/33-50/tr35-keyboards.html > > > > > Then I?d suggest that you request your name to be replaced with the > > real editor?s name(s). > > > > In turn I apologize for invoking your responsibility. > > > No need for any replacement or apology. That raises a question about how much liberty you actually have in editing TR #35-7, and about whether your tasks at IBM (your employer) leave any time to actually care for the documents that you are committed to put your name on, given that, again: http://unicode.org/repos/cldr/trunk/specs/ldml/tr35.html#Acknowledgments does not acknowledge you for anything related to keyboards, but only ?for development of the survey tool and database management.? > > > > > > > That now brings the need to file a whole bunch of tickets on a per?issue > > basis. I think it?s unresponsive to get other people wast so much time, > > for things that would have been fixed long ago if only the initial writers > > had been a bit careful. > > > I don?t see how there is a need to file more tickets, if your comments are > already captured. What am I missing? You might not, but I am. (See below.) Actually, CLDR would be better served by a bunch of 20 or 40 specific tickets, rather than half a dozen composite ones that people might be unable/unwilling to parse and process exhaustively. > > > The actual system of change logging doesn?t facilitate corrections, as > > you?re likely to put ?added plus signs in part of the modifier combos? > > and ?permutated suffix Ls and Rs to prefix Ls and Rs? and the like in > > the change log. > > > > No need to record ?added cell padding? and ?added sample code formatting? > > however. > > > Which changes are you referring to? In http://unicode.org/repos/cldr/trunk/specs/ldml/tr35-keyboards.html#Element_keyMap modifier combinations are concatenated using the plus sign, in other parts without '+'; while on the other hand, 'L' and 'R' are used as prefixes in http://unicode.org/repos/cldr/trunk/specs/ldml/tr35-keyboards.html#Invariants but as suffixes in other parts as explicitly specified, while I argue that they should be prefixes throughout, for consistency with English language, with OS usage (Windows: LSHIFT, RMENU, LCONTROL and so on; macOS: rightShift, rightOption, rightControl), and with political neutrality (?Alt Right? is a political organization). I?ve even filed this in a ticket: https://unicode.org/cldr/trac/ticket/10906 | The modifier labels should be titlecased, and the left/right should be | a prefix, not a suffix. The right Alt (AltGr) key can be labeled RAlt, | but it CANNOT (seriously) be labeled ?altR?! However, I filed it as part of ?/charts/keyboards/layouts/: Editorial feedback? that started with: | The tables representing the keyboard layout charts should have a table header row | containing the ISO column numbers. This will also make for an equal cell width. Mark accepted it ( https://unicode.org/cldr/trac/ticket/10906#comment:4 ), after I?d yet added a comment about quite another design suggestion: https://unicode.org/cldr/trac/ticket/10906#comment:1 This is a flagrant example of how useful it is to restrict every ticket to one single atomic issue. That?s what I was missing, believing that it was a good idea to post ?consolidated? feedback, which I misunderstood as covering *composite* feedback as well. Now back to your question. The plus sign should be used throughout for modifier concatenations because: 1) The inconsistent suffix/prefix use of L and R is otherwise confusing; 2) The space stands for AND on macOS, so that LDML which uses it for OR, should use plus for AND, rather than nothing. Hence I?ve added all missing plus signs and highlighted them correctly as changes, so that they are unoverseeable in http://charupdate.info/unicode/revision/tr35/33-50/tr35-keyboards.html Did yourself and/or other people of the keyboard group actually pay attention or at least take a glance? Further I?ve added cellpadding and removed the nested p elements in td, in an attempt to apply some layout conventions that are in current use in our civilisation. The last point noted above is consistent formatting of all XML blocks. > > > Why not a default 4px cellpadding to all table th/td elements? > > > > Just wondering while I was on it. > > > > It might be a good suggestion, but that would be a separate ticket. So that?s one more :) However, by now, since HTML cellpadding is overridden by CSS, I?d suggest adding ?padding-left: 10px;? to the ?th, td? rules. But the core part of TR #35 uses a 4px cellpadding specified in a HTML attribute ? the same that you left at "0" until v32 before removing it for v33? Anyway, leaving tables without left cellpadding, while using nested p elements in td to get top and bottom padding ? I can?t tell you? Best regards, Marcel From cldr-users at unicode.org Sat Mar 17 06:10:47 2018 From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users) Date: Sat, 17 Mar 2018 12:10:47 +0100 Subject: TR 35-7 In-Reply-To: <829905661.2557.1521277865828.JavaMail.www@wwinf1m17> References: <1641551864.15423.1518543848491.JavaMail.www@wwinf1n27> <1477667603.16552.1518546148339.JavaMail.www@wwinf1n27> <85233106.10914.1521207099614.JavaMail.www@wwinf1m17> <494700634.21248.1521238412609.JavaMail.www@wwinf1m17> <829905661.2557.1521277865828.JavaMail.www@wwinf1m17> Message-ID: Marcel, I'm sorry that the process of providing feedback has been so onerous. We have only a few people working on this project, and they are also fully booked with other projects. So the first priority has been to get the content in, recognizing that there is a lot to work on in formatting. (Aside from other issues, the keyboard part of the CLDR spec doesn't really follow the format of the rest of the document either.). For this release, we focused on content that reflected features that corresponded to what was in some major platforms, but that does not close off extensions in the future. In addition, a primary goal for the content is stability: keeping everything that was valid in v32 still valid in v33. So renaming elements, attributes, attribute values, etc. for clarity was not on the table. So I'm sure that we have not been able to give your feedback the attention that it deserves, or recognize the effort that has gone into it. I agree that it is probably better to split off one or more separate tickets. If the tickets are either too many or too long, they are difficult to handle, so we need some balance between them. We like tickets that are all on one topic, and whose resolution can all be done as a part of one task. We release every 6 months, if we don't get everything in any particular release, there are always future releases. Because we are so close to the deadline for this release, I'd suggest ? just for now ? splitting off one or maybe two tickets that are: 1. formatting not content 2. high priority: where the formatting strongly interferes with the reader's ability to understand the content 3. require a relatively small amount of work to the spec I will carve out some time on my Monday (Z?rich time) to review those in detail, and we can see what can be done. Mark On Sat, Mar 17, 2018 at 10:11 AM, Marcel Schneider via CLDR-Users < cldr-users at unicode.org> wrote: > On Fri Mar 16 18:00:04 CDT 2018, Steven R. Loomis wrote: > > > > El El vie, mar. 16, 2018 a las 3:13 p. m., Marcel Schneider < > > charupdate at orange.fr> escribi?: > > > > > On 16/03/18 17:16, Steven R. Loomis wrote: > > > > I think the bug tracking tool is best for tracking issues. You are > > > > already interacting with the editor there (I am not the editor). > > > > > > I meant the owner of the ticket, not the editor of the document. What I > > meant is that you have left feedback on > > https://unicode.org/cldr/trac/ticket/10901#comment:20 - so I think > > discussion of that feedback is best kept on that ticket. > > Given that there, the editorial feedback is buried under the more > substantial > concerns, the owner interacted so far as he replicated the comment he left > on all other PRI #367 tickets: > https://unicode.org/cldr/trac/ticket/10901#comment:17 > | > | There was a lot of feedback on this PRI. The keyboard group has made > | some modifications based on feedback, but decided to leave other features > | for consideration for a future version. > > That is not much of an interaction. > > It didn?t help, neither, that when I tried to attach revised HTML source > code, > the tool rejected it because it contained more than 4 external links. Then > I?ve disabled all external links, but ended up simply making it available > on the internet, rather than messing around with the file. > > However, inferring from Mark?s comment and the actual state of TR #35-7, > the keyboard group decided not to consider fixing all the small issues > that are parseable in: > http://charupdate.info/unicode/revision/tr35/33-50/tr35-keyboards.html > > > > > > > > > Then I?d suggest that you request your name to be replaced with the > > > real editor?s name(s). > > > > > > In turn I apologize for invoking your responsibility. > > > > > > No need for any replacement or apology. > > That raises a question about how much liberty you actually have in editing > TR #35-7, and about whether your tasks at IBM (your employer) leave any > time to actually care for the documents that you are committed to put > your name on, given that, again: > http://unicode.org/repos/cldr/trunk/specs/ldml/tr35.html#Acknowledgments > does not acknowledge you for anything related to keyboards, but only > ?for development of the survey tool and database management.? > > > > > > > > > > > > That now brings the need to file a whole bunch of tickets on a > per?issue > > > basis. I think it?s unresponsive to get other people wast so much time, > > > for things that would have been fixed long ago if only the initial > writers > > > had been a bit careful. > > > > > > I don?t see how there is a need to file more tickets, if your comments > are > > already captured. What am I missing? > > You might not, but I am. (See below.) > > Actually, CLDR would be better served by a bunch of 20 or 40 specific > tickets, > rather than half a dozen composite ones that people might be > unable/unwilling > to parse and process exhaustively. > > > > > > The actual system of change logging doesn?t facilitate corrections, as > > > you?re likely to put ?added plus signs in part of the modifier combos? > > > and ?permutated suffix Ls and Rs to prefix Ls and Rs? and the like in > > > the change log. > > > > > > No need to record ?added cell padding? and ?added sample code > formatting? > > > however. > > > > > > Which changes are you referring to? > > In > http://unicode.org/repos/cldr/trunk/specs/ldml/tr35- > keyboards.html#Element_keyMap > modifier combinations are concatenated using the plus sign, in other parts > without '+'; while on the other hand, 'L' and 'R' are used as prefixes in > http://unicode.org/repos/cldr/trunk/specs/ldml/tr35- > keyboards.html#Invariants > but as suffixes in other parts as explicitly specified, while I argue that > they should be prefixes throughout, for consistency with English language, > with OS usage (Windows: LSHIFT, RMENU, LCONTROL and so on; macOS: > rightShift, > rightOption, rightControl), and with political neutrality (?Alt Right? is > a political organization). I?ve even filed this in a ticket: > > https://unicode.org/cldr/trac/ticket/10906 > | The modifier labels should be titlecased, and the left/right should be > | a prefix, not a suffix. The right Alt (AltGr) key can be labeled RAlt, > | but it CANNOT (seriously) be labeled ?altR?! > > However, I filed it as part of ?/charts/keyboards/layouts/: Editorial > feedback? > that started with: > | The tables representing the keyboard layout charts should have a table > header row > | containing the ISO column numbers. This will also make for an equal cell > width. > > Mark accepted it ( > https://unicode.org/cldr/trac/ticket/10906#comment:4 > ), after I?d yet added a comment about quite another design suggestion: > https://unicode.org/cldr/trac/ticket/10906#comment:1 > > This is a flagrant example of how useful it is to restrict every ticket to > one single atomic issue. That?s what I was missing, believing that it was a > good idea to post ?consolidated? feedback, which I misunderstood as > covering > *composite* feedback as well. > > Now back to your question. The plus sign should be used throughout for > modifier concatenations because: > 1) The inconsistent suffix/prefix use of L and R is otherwise confusing; > 2) The space stands for AND on macOS, so that LDML which uses it for OR, > should use plus for AND, rather than nothing. > > Hence I?ve added all missing plus signs and highlighted them correctly as > changes, so that they are unoverseeable in > http://charupdate.info/unicode/revision/tr35/33-50/tr35-keyboards.html > > Did yourself and/or other people of the keyboard group actually pay > attention > or at least take a glance? > > Further I?ve added cellpadding and removed the nested p elements in td, in > an attempt to apply some layout conventions that are in current use in our > civilisation. > > The last point noted above is consistent formatting of all XML blocks. > > > > > > Why not a default 4px cellpadding to all table th/td elements? > > > > > > Just wondering while I was on it. > > > > > > > It might be a good suggestion, but that would be a separate ticket. > > So that?s one more :) > > However, by now, since HTML cellpadding is overridden by CSS, I?d > suggest adding ?padding-left: 10px;? to the ?th, td? rules. But the core > part of TR #35 uses a 4px cellpadding specified in a HTML attribute ? > the same that you left at "0" until v32 before removing it for v33? > Anyway, leaving tables without left cellpadding, while using nested p > elements in td to get top and bottom padding ? I can?t tell you? > > Best regards, > > Marcel > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Sat Mar 17 08:20:07 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Sat, 17 Mar 2018 14:20:07 +0100 (CET) Subject: TR 35-7 In-Reply-To: References: <1641551864.15423.1518543848491.JavaMail.www@wwinf1n27> <1477667603.16552.1518546148339.JavaMail.www@wwinf1n27> <85233106.10914.1521207099614.JavaMail.www@wwinf1m17> <494700634.21248.1521238412609.JavaMail.www@wwinf1m17> <829905661.2557.1521277865828.JavaMail.www@wwinf1m17> Message-ID: <506069269.5992.1521292807623.JavaMail.www@wwinf1j13> On Sat Mar 17 06:10:47 CDT 2018, Mark Davis ?? wrote: > > I'm sorry that the process of providing feedback has been so onerous. No worries. I?m ready to invest the needed time provided that it will be of some use at some point. Hence my related concerns are essentially about making that sure. > We have only a few people working on this project, and they are also fully > booked with other projects. That?s what I ended up suspecting, and I must confess that I?m committed to several keyboarding projects where the evaluated needs are calling for new but still easily implementable solutions. Hence I?ve used this opportunity to join in trying to get CLDR support them, while there is still a lot of work here, that I?m expected to complete these days. > So the first priority has been to get the > content in, recognizing that there is a lot to work on in formatting. I think it?s a good idea to dispatch the changes over several versions if there is always only one intermediate revision (actually revision 50), so that the changes can be tracked by sets, actually the new features presented in http://www.unicode.org/review/pri367/pri367-Unicode-LDML-Keyboard-Enhancements.pdf > (Aside from other issues, the keyboard part of the CLDR spec doesn't really > follow the format of the rest of the document either.). Following the template is indeed a more appropriate approach. While editing a copy of the TR stylesheet, I didn?t much look up HTML usage, because I?m desperately lacking so much time. > > For this release, we focused on content that reflected features that > corresponded to what was in some major platforms, but that does not close > off extensions in the future. In addition, a primary goal for the content > is stability: keeping everything that was valid in v32 still valid in v33. > So renaming elements, attributes, attribute values, etc. for clarity was > not on the table. Stability is useful indeed, at least as long as it doesn?t compromise usability. We need then to find out a way of accommodating the existing scheme, e.g. by adding the transform="yes" value. One way I suggested is to add an attribute in the settings so that one can use a new scheme while the existing one remains valid. > > So I'm sure that we have not been able to give your feedback the attention > that it deserves, or recognize the effort that has gone into it. Hereby that?s done for now. > > I agree that it is probably better to split off one or more separate > tickets. If the tickets are either too many or too long, they are difficult > to handle, so we need some balance between them. We like tickets that are > all on one topic, and whose resolution can all be done as a part of one > task. Got it, thanks. > > We release every 6 months, if we don't get everything in any particular > release, there are always future releases. Because we are so close to the > deadline for this release, I'd suggest ? just for now ? splitting off one > or maybe two tickets that are: > > 1. formatting not content > 2. high priority: where the formatting strongly interferes with the > reader's ability to understand the content > 3. require a relatively small amount of work to the spec I see several points that can be fixed in nearly no time but that do not heavily impact readability. The reader is still able to infer and mentally correct, My concern in this field is primarily about the question marks rising out of his or her head, as I?ll have to link to TR #35-7 (rather than to translate it ? which we?ve seen on Unicode Public is not a working solution any longer) and add a disclaimer if things are not fixed by then. > > I will carve out some time on my Monday (Z?rich time) to review those in > detail, and we can see what can be done. I?d like for now that we?ll both save our time, looking forward to version 34. Thanks however. Best regards, Marcel From cldr-users at unicode.org Fri Mar 23 01:48:16 2018 From: cldr-users at unicode.org (Martin Hosken via CLDR-Users) Date: Fri, 23 Mar 2018 13:48:16 +0700 Subject: one macro language, two default languages Message-ID: <20180323134816.1558ce12@sil-mh8> Dear All, I notice that both mnk and emk have language aliases to man. I can understanding picking one language as the concrete equivalent of a macro language, but two? The algorithm is one way and so there is nothing to stop this happening. But when it comes to actual locale data, which locale should end up in man.xml? TIA, GB, Martin From cldr-users at unicode.org Fri Mar 23 03:21:14 2018 From: cldr-users at unicode.org (Martin Hosken via CLDR-Users) Date: Fri, 23 Mar 2018 15:21:14 +0700 Subject: BCP47 to CLDR conversion Message-ID: <20180323152114.0d328774@sil-mh8> Dear All, Would someone be willing to walk me through how en-US maps to en_US when converting from BCP47 to CLDR via the algorithm in 3.3.1? My take is: 1. Canonicalize the language tag: en-US -> en (through likelySubtags minimising) 2. und->root: en -> en 3. languageAlias matching: no match: en -> en 4. territoryAlias matching: no match: en -> en I'm probably misunderstanding step 1. (Is there a reference for step 1 and what that subalgorithm is?) TIA, Yours, Martin From cldr-users at unicode.org Fri Mar 23 04:46:21 2018 From: cldr-users at unicode.org (Martin Hosken via CLDR-Users) Date: Fri, 23 Mar 2018 16:46:21 +0700 Subject: bh & bho macro relationship Message-ID: <20180323164621.3b3384e1@sil-mh8> Dear All, bho is a macro language to which bh maps. But bho has a likely subtags entry of bho-Deva-IN, while bh has a likely subtags entry of bh-Kthi-IN. Is this OK? It makes for interesting folding issues and could well get some wrong results depending on how the tag to CLDR id works. On that, if anyone is up for answering the en-US question, could they then explain what happens to en-Latn-US through the same process. TIA, Yours, Martin From cldr-users at unicode.org Fri Mar 23 05:07:43 2018 From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users) Date: Fri, 23 Mar 2018 11:07:43 +0100 Subject: bh & bho macro relationship In-Reply-To: <20180323164621.3b3384e1@sil-mh8> References: <20180323164621.3b3384e1@sil-mh8> Message-ID: The likely subtags are built to allow a certain degree of flexibility for the implementations. That is, they don't normalize the source, but rather maintain the "denormalizations". The most prominent example is: Both are used, so that an implementation that uses 'iw' as the canonical form (eg Java) can still use the data. Now, we don't include all the possible denormalized forms, but we do include the ones that have in some way been used in that fashion. Most of the other data in CLDR doesn't have to have both forms, because it doesn't contain language tags both in the 'input' and the 'output'. Make sense? Mark On Fri, Mar 23, 2018 at 10:46 AM, Martin Hosken via CLDR-Users < cldr-users at unicode.org> wrote: > Dear All, > > bho is a macro language to which bh maps. But bho has a likely subtags > entry of bho-Deva-IN, while bh has a likely subtags entry of bh-Kthi-IN. Is > this OK? It makes for interesting folding issues and could well get some > wrong results depending on how the tag to CLDR id works. > > On that, if anyone is up for answering the en-US question, could they then > explain what happens to en-Latn-US through the same process. > > TIA, > Yours, > Martin > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Fri Mar 23 05:49:10 2018 From: cldr-users at unicode.org (Martin Hosken via CLDR-Users) Date: Fri, 23 Mar 2018 10:49:10 +0000 Subject: bh & bho macro relationship In-Reply-To: References: <20180323164621.3b3384e1@sil-mh8> Message-ID: Dear Mark, That sort of makes sense. But here I am talking about entries in languageAlias for macrolanguage folding. Where two different sublanguages fold to the same macro language. GB, Martin On Fri, 23 Mar 2018, 17:08 Mark Davis ??, wrote: > The likely subtags are built to allow a certain degree of flexibility for > the implementations. That is, they don't normalize the source, but rather > maintain the "denormalizations". The most prominent example is: > > > > > > > > > > Both are used, so that an implementation that uses 'iw' as the canonical > form (eg Java) can still use the data. Now, we don't include all the > possible denormalized forms, but we do include the ones that have in some > way been used in that fashion. > > Most of the other data in CLDR doesn't have to have both forms, because it > doesn't contain language tags both in the 'input' and the 'output'. > > Make sense? > > Mark > > On Fri, Mar 23, 2018 at 10:46 AM, Martin Hosken via CLDR-Users < > cldr-users at unicode.org> wrote: > >> Dear All, >> >> bho is a macro language to which bh maps. But bho has a likely subtags >> entry of bho-Deva-IN, while bh has a likely subtags entry of bh-Kthi-IN. Is >> this OK? It makes for interesting folding issues and could well get some >> wrong results depending on how the tag to CLDR id works. >> >> On that, if anyone is up for answering the en-US question, could they >> then explain what happens to en-Latn-US through the same process. >> >> TIA, >> Yours, >> Martin >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Fri Mar 23 05:59:26 2018 From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users) Date: Fri, 23 Mar 2018 11:59:26 +0100 Subject: BCP47 to CLDR conversion In-Reply-To: <20180323152114.0d328774@sil-mh8> References: <20180323152114.0d328774@sil-mh8> Message-ID: #1 incorporates some of those. See http://unicode.org/reports/tr35/#Likely_Subtags. Let us know if you find any problems in that description! Mark Mark On Fri, Mar 23, 2018 at 9:21 AM, Martin Hosken via CLDR-Users < cldr-users at unicode.org> wrote: > Dear All, > > Would someone be willing to walk me through how en-US maps to en_US when > converting from BCP47 to CLDR via the algorithm in 3.3.1? > > My take is: > > 1. Canonicalize the language tag: en-US -> en (through likelySubtags > minimising) > 2. und->root: en -> en > 3. languageAlias matching: no match: en -> en > 4. territoryAlias matching: no match: en -> en > > I'm probably misunderstanding step 1. (Is there a reference for step 1 and > what that subalgorithm is?) > > TIA, > Yours, > Martin > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Fri Mar 23 06:11:17 2018 From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users) Date: Fri, 23 Mar 2018 12:11:17 +0100 Subject: one macro language, two default languages In-Reply-To: <20180323134816.1558ce12@sil-mh8> References: <20180323134816.1558ce12@sil-mh8> Message-ID: We do enforce transitivity of the aliases, so that you can get multiple items mapping to the same replacement: zho => zh cmn => zho => zh Here is another example, where on the basis of information we have gotten, there is no material difference between Tagalog and Filipino However, we shouldn't do that where there wouldn't otherwise be aliases. If you find cases that you think are wrong, you should file a ticket (with your reasoning) and we can review. One concrete action we could consider taking would be to allow the @reason be a space-delimited list, and thus be able to represent the "chain" of reasons for a replacement. Mark On Fri, Mar 23, 2018 at 7:48 AM, Martin Hosken via CLDR-Users < cldr-users at unicode.org> wrote: > Dear All, > > I notice that both mnk and emk have language aliases to man. I can > understanding picking one language as the concrete equivalent of a macro > language, but two? > > The algorithm is one way and so there is nothing to stop this happening. > But when it comes to actual locale data, which locale should end up in > man.xml? > > TIA, > GB, > Martin > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Fri Mar 23 08:14:37 2018 From: cldr-users at unicode.org (Philippe Verdy via CLDR-Users) Date: Fri, 23 Mar 2018 14:14:37 +0100 Subject: one macro language, two default languages In-Reply-To: <20180323134816.1558ce12@sil-mh8> References: <20180323134816.1558ce12@sil-mh8> Message-ID: If you can distinguish a region in addition to the language, the fallbacks chain can be customized, but this requires mapping not just one fallback language (and infering the others from it), but setting up **locale** fallbacks chains so that you'll map a locale to an list of locales, **ordered** by preference. and using these chains **before** looking for generic fallbacks. E.g. Given these two tuned mappings: { "nds-nl" : ["nl"], "nds": ["de"], } Here if a resource is not found in "nds-nl", it will be searched first in "nl", then "nl" **before** searching in "nds" from implied BCP47 fallbacks, then "de" by closure of the first mapping. This can be used to tune the behavior of fallbacks from "man", but only if you can distinguish two distinct usages for regions. So you'll get these mappings: { "man": ["mnk"], "man-XY": "emk", //replace "man-XY" by the relevant distinctive region or variant. "mnk": ["emk"], "emk": ["mnk"], } This solves all practical problems, but in all cases you'll have to choose if the first line above should be: "man": ["mnk"], or "man": ["emk"], Here you need an arbitrary choice, or choice based on frequency of usage, or on the level of coverage with localized data you have the best between "mnk" and "emk". This initial arbitrary choice can be tuned at any time later in your project. Note that without the last two rules: "mnk": ["emk"], "emk": ["mnk"], you'd need to specify the first rule either as: "man" -> ["mnk", "emk"], or: "man" -> ["emk", "mnk"], with the same kind of arbitrary or motivated choice. For this reason, such arbitray mapping cannot/should not be done in generic CLDR data which, can ONLY specify the last two rules, which do not take any decision, that each application developer/maintainer will have to take themselves: { "mnk" -> ["emk"], "emk" -> ["mnk"], } But generic CLDR data can also contain this additional mapping where it is distinctive enough: "man-XY" -> "emk", //replace "man-XY" by the relevant distinctive region or variant without deciding anything about the ordered mapping of fallbacks from "man". Philippe. 2018-03-23 7:48 GMT+01:00 Martin Hosken via CLDR-Users < cldr-users at unicode.org>: > Dear All, > > I notice that both mnk and emk have language aliases to man. I can > understanding picking one language as the concrete equivalent of a macro > language, but two? > > The algorithm is one way and so there is nothing to stop this happening. > But when it comes to actual locale data, which locale should end up in > man.xml? > > TIA, > GB, > Martin > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Fri Mar 23 06:32:34 2018 From: cldr-users at unicode.org (=?UTF-8?B?0KHQvtGD0YEt0LrRg9C9?= via CLDR-Users) Date: Fri, 23 Mar 2018 14:32:34 +0300 Subject: Support of Old Church Slavic language sublocale in CLDR Message-ID: Dear all, Two years ago I've openned the ticket https://unicode.org/cldr/trac/ticket/9238 about adding separate sublocale for Old Church Slavic language within the "cu" CLDR locale. The ticked was closed for formal reasons, but I believe this issue needs further discussion. Though I have explained the problem in the ticked, I would like to point out the main issues here again: - Linguists consider 'Old Church Slavic' and 'Church Slavic' two different languanges (see https://en.wikipedia.org/wiki/Old_Church_Slavonic and https://en.wikipedia.org/wiki/Church_Slavonic_language for reference). - ISO 639 code "cu"/"chu" is shared for both ( http://www-01.sil.org/iso639-3/documentation.asp?id=chu). Other instances as IANA subtags list follow this convention. CLDR also seems to follow it with having only one "cu" locale variant called "Church Slavic". - Hence there are two main problems: 1) there are no officially designated subtags for distinguish "Old Church Slavic" and "Church Slavic"; 2) CLDR now supports only one possible variation for "cu"code, the "Church Slavic" one. So I would like to ask whether it's still possible to add "Old Church Slavic language" locale to CLDR? The data itself can be easily created via cloning the existing "Church Slavic" one and further fixing of spelling (I can work on that as the Old Church Slavic student and enthusiast), but what the subtag is supposed to be? I'd say, from a formal standpoint it's the main problem, as ISO and IANA did not care about possible language variants while implementing code "cu". Best regards, Therapont -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Fri Mar 23 10:52:29 2018 From: cldr-users at unicode.org (Doug Ewell via CLDR-Users) Date: Fri, 23 Mar 2018 08:52:29 -0700 Subject: bh & bho macro relationship Message-ID: <20180323085229.665a7a7059d7ee80bb4d670165c8327d.e2fcb01710.wbe@email03.godaddy.com> Martin Hosken wrote: > That sort of makes sense. But here I am talking about entries in > languageAlias for macrolanguage folding. Where two different > sublanguages fold to the same macro language. I wish I understood how CLDR uses the term "macrolanguage", because it apparently has nothing to do with how ISO 639-3 or BCP 47 use it. In BCP 47, neither 'bh' ("Bihari languages") nor 'bho' ("Bhojpuri") is a macrolanguage, nor is either encompassed by one. -- Doug Ewell | Thornton, CO, US | ewellic.org From cldr-users at unicode.org Sat Mar 24 03:44:44 2018 From: cldr-users at unicode.org (Aleksandr Andreev via CLDR-Users) Date: Sat, 24 Mar 2018 11:44:44 +0300 Subject: Support of Old Church Slavic language sublocale in CLDR In-Reply-To: References: Message-ID: On Fri, Mar 23, 2018 at 2:32 PM, ????-??? via CLDR-Users wrote: > Dear all, > > Two years ago I've openned the ticket > https://unicode.org/cldr/trac/ticket/9238 about adding separate sublocale > for Old Church Slavic language within the "cu" CLDR locale. > The ticked was closed for formal reasons, but I believe this issue needs > further discussion. > As I wrote in the original ticket, in my view, the way to handle this would be to separate "Old" Church Slavic (whatever is meant by this term) from Church Slavic the way Ancient Greek has been separated from modern Greek: grc is the ISO 639-2 code for Ancient Greek and ell or gre are the ISO 639-2 codes for Modern Greek. I write "whatever is meant by this term" to underline my general concern that "Old" Church Slavic does not seem to be a well-defined term. We've defined "Church Slavic" in CLDR to be the current liturgical language used by the Russian Orthodox Church and other Orthodox and Byzantine Catholic Churches. (Variants can be specified as cu_BG, cu_RU, cu_UA, etc.). This language has well documented norms and a well-established user community. By "Old" Church Slavonic I guess the questioner means the literary language used in manuscripts around the 9th-10th century? Is there a need for having this in CLDR as a separate locale? Does CLDR even provide support for ancient languages? I don't see data in CLDR for Latin, Ancient Greek, Avestan or Sanskrit, for example. What additional functionality would be provided to the user community by including "Old" Church Slavic as a separate locale? > - Linguists consider 'Old Church Slavic' and 'Church Slavic' two different > languanges (see https://en.wikipedia.org/wiki/Old_Church_Slavonic and > https://en.wikipedia.org/wiki/Church_Slavonic_language for reference). The original ticket proposes adding Old Church Slavic data from the Church Slavonic Wikipedia. How authoritative a source is this for language data? Also, if all that is needed is support for the Glagolitic script, we could define cu_Glag and add data in Glagolitic there. Cordially, Aleksandr From cldr-users at unicode.org Mon Mar 26 05:16:37 2018 From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users) Date: Mon, 26 Mar 2018 12:16:37 +0200 Subject: Support of Old Church Slavic language sublocale in CLDR In-Reply-To: References: Message-ID: Some quick comments. > Is there a need for having this in CLDR as a separate locale? Does CLDR even provide support for ancient languages? While it is possible from someone to propose adding an ancient language (as per http://cldr.unicode.org/index/cldr-spec/minimaldata), I do think the utility would be extremely limited. As with all other languages, we would need a commitment to add the minimal data. > As I wrote in the original ticket, in my view, the way to handle this would be to separate "Old" Church Slavic (whatever is meant by this term) from Church Slavic the way Ancient Greek has been separated from modern Greek: grc is the ISO 639-2 code for Ancient Greek and ell or gre are the ISO 639-2 codes for Modern Greek. We follow BCP47 for codes, so we can't make up a code (as suggested by souschan at gmail.com) for "cu-old". If "Old" Church Slavic is sufficiently different from Church Slavic (eg at least as different as Danish and Swedish)\, then a new language code should be proposed to the ISO 639 group, as you suggested. If it is more like a dialect difference, then in theory a variant should be proposed. In practice, variants (other than Script and Region) are not very well supported in software, however. Mark On Sat, Mar 24, 2018 at 9:44 AM, Aleksandr Andreev via CLDR-Users < cldr-users at unicode.org> wrote: > On Fri, Mar 23, 2018 at 2:32 PM, ????-??? via CLDR-Users > wrote: > > Dear all, > > > > Two years ago I've openned the ticket > > https://unicode.org/cldr/trac/ticket/9238 about adding separate > sublocale > > for Old Church Slavic language within the "cu" CLDR locale. > > The ticked was closed for formal reasons, but I believe this issue needs > > further discussion. > > > > As I wrote in the original ticket, in my view, the way to handle this > would be to separate "Old" Church Slavic (whatever is meant by this > term) from Church Slavic the way Ancient Greek has been separated from > modern Greek: grc is the ISO 639-2 code for Ancient Greek and ell or > gre are the ISO 639-2 codes for Modern Greek. > > I write "whatever is meant by this term" to underline my general > concern that "Old" Church Slavic does not seem to be a well-defined > term. We've defined "Church Slavic" in CLDR to be the current > liturgical language used by the Russian Orthodox Church and other > Orthodox and Byzantine Catholic Churches. (Variants can be specified > as cu_BG, cu_RU, cu_UA, etc.). This language has well documented norms > and a well-established user community. > > By "Old" Church Slavonic I guess the questioner means the literary > language used in manuscripts around the 9th-10th century? > > Is there a need for having this in CLDR as a separate locale? Does > CLDR even provide support for ancient languages? I don't see data in > CLDR for Latin, Ancient Greek, Avestan or Sanskrit, for example. > > What additional functionality would be provided to the user community > by including "Old" Church Slavic as a separate locale? > > > - Linguists consider 'Old Church Slavic' and 'Church Slavic' two > different > > languanges (see https://en.wikipedia.org/wiki/Old_Church_Slavonic and > > https://en.wikipedia.org/wiki/Church_Slavonic_language for reference). > > The original ticket proposes adding Old Church Slavic data from the > Church Slavonic Wikipedia. How authoritative a source is this for > language data? > > Also, if all that is needed is support for the Glagolitic script, we > could define cu_Glag and add data in Glagolitic there. > > Cordially, > > Aleksandr > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Mon Mar 26 11:26:07 2018 From: cldr-users at unicode.org (=?UTF-8?B?0KTQtdGA0LDQv9C+0L3RgiDQodC+0YPRgdC+0LI=?= via CLDR-Users) Date: Mon, 26 Mar 2018 19:26:07 +0300 Subject: Support of Old Church Slavic language sublocale in CLDR In-Reply-To: References: Message-ID: > If it is more like a dialect difference, then in theory a variant should be proposed. In practice, variants (other than Script and Region) are not very well supported in software, however. I think current ISO 639-2 standard for "cu/chu" suggests term "Church Slavic" to be some kind of umbrella term with having all the possible variants under one "cu" tag. (See http://www-01.sil.org/iso639-3/documentation.asp?id=chu) So from that point of view proposing a variant could be sufficient solution. Having the new ISO code being proposed on the other hand would rise problems with back compatibility as people now actively use "cu" tag in their code for "Old Church Slavic", "Russian Church Slavic" and whatever else as the strandard suggests. But what does it mean in practical sence? We can not make CLDR "cu" locale to support all proper variations for "cu" ISO 639-1 unless subtags for those variants are designated by third party as BCP47 of IANA? What should I do from now on? And it still leaves the issue with current CLDR "cu" locale. Now it's set to be "Church Slavic used by the Russian Orthodox Church" exclusively. It's not quite the same current ISO 639 for "cu" suggests. What is Russian Church Slavic from lingustical standpoint? It's the modern variant of former Old Church Slavic evolved and established in Russia. From that logic default "cu" locale could just be Old Church Slavic as an originally universal an initial variant of literary Church Slavic language with having any modern and ancient geographical variants being set with respective geographical subtags. Namely Russian Church Slavic to be cu_RU. It would be the sipliest solution actually. 2018-03-26 13:16 GMT+03:00 Mark Davis ?? : > Some quick comments. > > > Is there a need for having this in CLDR as a separate locale? Does > CLDR even provide support for ancient languages? > > While it is possible from someone to propose adding an ancient language > (as per http://cldr.unicode.org/index/cldr-spec/minimaldata), I do think > the utility would be extremely limited. As with all other languages, we > would need a commitment to add the minimal data. > > > As I wrote in the original ticket, in my view, the way to handle this > would be to separate "Old" Church Slavic (whatever is meant by this > term) from Church Slavic the way Ancient Greek has been separated from > modern Greek: grc is the ISO 639-2 code for Ancient Greek and ell or > gre are the ISO 639-2 codes for Modern Greek. > > We follow BCP47 for codes, so we can't make up a code (as suggested by > souschan at gmail.com) for "cu-old". If "Old" Church Slavic is sufficiently > different from Church Slavic (eg at least as different as Danish and > Swedish)\, then a new language code should be proposed to the ISO 639 > group, as you suggested. If it is more like a dialect difference, then in > theory a variant should be proposed. In practice, variants (other than > Script and Region) are not very well supported in software, however. > > Mark > > On Sat, Mar 24, 2018 at 9:44 AM, Aleksandr Andreev via CLDR-Users < > cldr-users at unicode.org> wrote: > >> On Fri, Mar 23, 2018 at 2:32 PM, ????-??? via CLDR-Users >> wrote: >> > Dear all, >> > >> > Two years ago I've openned the ticket >> > https://unicode.org/cldr/trac/ticket/9238 about adding separate >> sublocale >> > for Old Church Slavic language within the "cu" CLDR locale. >> > The ticked was closed for formal reasons, but I believe this issue needs >> > further discussion. >> > >> >> As I wrote in the original ticket, in my view, the way to handle this >> would be to separate "Old" Church Slavic (whatever is meant by this >> term) from Church Slavic the way Ancient Greek has been separated from >> modern Greek: grc is the ISO 639-2 code for Ancient Greek and ell or >> gre are the ISO 639-2 codes for Modern Greek. >> >> I write "whatever is meant by this term" to underline my general >> concern that "Old" Church Slavic does not seem to be a well-defined >> term. We've defined "Church Slavic" in CLDR to be the current >> liturgical language used by the Russian Orthodox Church and other >> Orthodox and Byzantine Catholic Churches. (Variants can be specified >> as cu_BG, cu_RU, cu_UA, etc.). This language has well documented norms >> and a well-established user community. >> >> By "Old" Church Slavonic I guess the questioner means the literary >> language used in manuscripts around the 9th-10th century? >> >> Is there a need for having this in CLDR as a separate locale? Does >> CLDR even provide support for ancient languages? I don't see data in >> CLDR for Latin, Ancient Greek, Avestan or Sanskrit, for example. >> >> What additional functionality would be provided to the user community >> by including "Old" Church Slavic as a separate locale? >> >> > - Linguists consider 'Old Church Slavic' and 'Church Slavic' two >> different >> > languanges (see https://en.wikipedia.org/wiki/Old_Church_Slavonic and >> > https://en.wikipedia.org/wiki/Church_Slavonic_language for reference). >> >> The original ticket proposes adding Old Church Slavic data from the >> Church Slavonic Wikipedia. How authoritative a source is this for >> language data? >> >> Also, if all that is needed is support for the Glagolitic script, we >> could define cu_Glag and add data in Glagolitic there. >> >> Cordially, >> >> Aleksandr >> >> _______________________________________________ >> CLDR-Users mailing list >> CLDR-Users at unicode.org >> http://unicode.org/mailman/listinfo/cldr-users >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Wed Mar 28 13:28:24 2018 From: cldr-users at unicode.org (Tom Hughes via CLDR-Users) Date: Wed, 28 Mar 2018 19:28:24 +0100 Subject: Removal of distinguishingItems in CLDR 33 Message-ID: <6d713cd0-ac92-3a73-e50d-1667ea074a12@compton.nu> Apparently the distinguishingItems information in the supplemental metadata has been moved in CLDR 33 but I'm struggling to understand what I am supposed to use instead. I've found http://unicode.org/cldr/trac/ticket/10194 and also the description in TR35 at: https://www.unicode.org/reports/tr35/tr35-51/tr35.html#Valid_Attribute_Values Apparently it has been replaced by "annotations in the DTD and the DTDData classes in CLDR tooling" which apparently means I need to read the DTD and parse it and somehow extract special comments and associate them with the preceding attribute. Somehow that doesn't seem very practical, especially compared to the simple system that existed before and which could be handled with a standard XML parser! Tom -- Tom Hughes (tom at compton.nu) http://compton.nu/ From cldr-users at unicode.org Wed Mar 28 13:46:14 2018 From: cldr-users at unicode.org (Steven R. Loomis via CLDR-Users) Date: Wed, 28 Mar 2018 11:46:14 -0700 Subject: Removal of distinguishingItems in CLDR 33 In-Reply-To: <6d713cd0-ac92-3a73-e50d-1667ea074a12@compton.nu> References: <6d713cd0-ac92-3a73-e50d-1667ea074a12@compton.nu> Message-ID: The changes made in #10194 should have linked to https://www.unicode.org/reports/tr35/tr35.html#DTD_Annotations instead of just saying "annotations in the DTD and the DTDData classes in CLDR tooling" but yes, I can see how parsing comments is painful. It might be good to ship derived data files based on the output of DTDData. Can you file a bug for cleaning up the docs and providing the metadata? On Wed, Mar 28, 2018 at 11:28 AM, Tom Hughes via CLDR-Users < cldr-users at unicode.org> wrote: > Apparently the distinguishingItems information in the supplemental > metadata has been moved in CLDR 33 but I'm struggling to understand > what I am supposed to use instead. > > I've found http://unicode.org/cldr/trac/ticket/10194 and also the > description in TR35 at: > > https://www.unicode.org/reports/tr35/tr35-51/tr35.html# > Valid_Attribute_Values > > Apparently it has been replaced by "annotations in the DTD and the > DTDData classes in CLDR tooling" which apparently means I need to > read the DTD and parse it and somehow extract special comments and > associate them with the preceding attribute. > > Somehow that doesn't seem very practical, especially compared to > the simple system that existed before and which could be handled > with a standard XML parser! > > Tom > > -- > Tom Hughes (tom at compton.nu) > http://compton.nu/ > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Wed Mar 28 15:50:56 2018 From: cldr-users at unicode.org (Tom Hughes via CLDR-Users) Date: Wed, 28 Mar 2018 21:50:56 +0100 Subject: Removal of distinguishingItems in CLDR 33 In-Reply-To: References: <6d713cd0-ac92-3a73-e50d-1667ea074a12@compton.nu> Message-ID: <0751f694-f37d-5350-edf2-d185606d6024@compton.nu> After more investigation I realised it's not actually even obvious how to map the annotations to this - I think it's something like anything without @VALUE, @METADATA or @DEPRECATED? In any case I've hard coded the value from CLDR 32 in my scripts for now... I've filed https://unicode.org/cldr/trac/ticket/11026 though I did have a bit of trouble with the captcha as it kept telling me that the V1 recaptcha had shut down on 2018-03-31 which obviously isn't actually true yet but will be very soon. Tom On 28/03/18 19:46, Steven R. Loomis wrote: > The changes made in #10194 should have linked to > https://www.unicode.org/reports/tr35/tr35.html#DTD_Annotations ?instead > of just saying "annotations in the DTD and the > DTDData classes in CLDR tooling" but yes, I can see how parsing comments > is painful. ? ?It might be good to ship derived data files based on the > output of DTDData. > > Can you file a bug for cleaning up the docs and providing the metadata? > > On Wed, Mar 28, 2018 at 11:28 AM, Tom Hughes via CLDR-Users > > wrote: > > Apparently the distinguishingItems information in the supplemental > metadata has been moved in CLDR 33 but I'm struggling to understand > what I am supposed to use instead. > > I've found http://unicode.org/cldr/trac/ticket/10194 > and also the > description in TR35 at: > > https://www.unicode.org/reports/tr35/tr35-51/tr35.html#Valid_Attribute_Values > > > Apparently it has been replaced by "annotations in the DTD and the > DTDData classes in CLDR tooling" which apparently means I need to > read the DTD and parse it and somehow extract special comments and > associate them with the preceding attribute. > > Somehow that doesn't seem very practical, especially compared to > the simple system that existed before and which could be handled > with a standard XML parser! > > Tom > > -- > Tom Hughes (tom at compton.nu ) > http://compton.nu/ > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > > -- Tom Hughes (tom at compton.nu) http://compton.nu/ From cldr-users at unicode.org Fri Mar 30 21:09:23 2018 From: cldr-users at unicode.org (Richard Wordingham via CLDR-Users) Date: Sat, 31 Mar 2018 03:09:23 +0100 Subject: Support of Old Church Slavic language sublocale in CLDR In-Reply-To: References: Message-ID: <20180331030923.619eefe2@JRWUBU2> On Mon, 26 Mar 2018 12:16:37 +0200 Mark Davis ?? via CLDR-Users wrote: > While it is possible from someone to propose adding an ancient > language (as per > http://cldr.unicode.org/index/cldr-spec/minimaldata), I do think the > utility would be extremely limited. As with all other languages, we > would need a commitment to add the minimal data. One serious use in some cases is in line and word breaking. Or am I overlooking a special tag for whether the language is effectively scriptio continua or not? One feature I have noticed is that in modern usage Pali has a strong tendency to have words separated by spaces or other punctuation in the Thai and Tai Tham scripts. CLDR seems strongly geared to languages used for man-machine interfaces, but they are not the only ones that would benefit from CLDR support. CLDR contains data for text layout as well as date formats and data for pick-lists. Richard. From cldr-users at unicode.org Fri Mar 30 21:22:33 2018 From: cldr-users at unicode.org (=?UTF-8?Q?Martin_J._D=c3=bcrst?= via CLDR-Users) Date: Sat, 31 Mar 2018 11:22:33 +0900 Subject: Support of Old Church Slavic language sublocale in CLDR In-Reply-To: <20180331030923.619eefe2@JRWUBU2> References: <20180331030923.619eefe2@JRWUBU2> Message-ID: <4cacedf2-6f1c-d2c6-5fd4-0b9633873b95@it.aoyama.ac.jp> On 2018/03/31 11:09, Richard Wordingham via CLDR-Users wrote: > CLDR seems strongly geared to languages used for man-machine > interfaces, but they are not the only ones that would benefit from CLDR > support. CLDR contains data for text layout as well as date formats > and data for pick-lists. I may be completely wrong, but indeed, CLDR has a strong focus on user interface languages. Maybe we need a new category of languages that is below what's currently "minimal support" (or whatever it's called), which would be used for languages where there's content but no need for user interfaces. This category would mean that it would only cover data for text layout,... Regards, Martin.