From cldr-users at unicode.org Tue Sep 4 21:02:56 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Wed, 5 Sep 2018 04:02:56 +0200 (CEST) Subject: CLDR survey / Polish keyboard (was: Re: CLDR) Message-ID: <64792931.11196.1536112976805.JavaMail.www@wwinf1h11> I?m taking this from Unicode Public mailing list, as the topics belong here. Though I already responded off-list and would prefer stepping out, I?m afraid that at least the CLDR part could be really useful in fighting certain baseline problems I encountered while being given the opportunity to participate in surveying fr-FR locale data for the on-coming v34. Hence I feel committed to respond ?on the record? and reopen the door for eventual follow-up, if ever I could have seemed to close it. Indeed while there were many errors and flaws in the data, most covetters ended up lacking time to completely review all the items, despite doing a really great job while devoting many hours to these tasks. After not trying to dig deeper so I would have learned what are the issues beneath, I now simply speculated on my own about what might have triggered the problems in reviewing data and ensuring quality. The goal is to make CLDR data more reliable, and to suggest what vendors might wish to do for that purpose. At top of the below I?ve cut off a snippet unrelated to these topics, and further, a snippet for privacy. The slightly blunter off-list wording by contrast has not been redacted. I?ll advise Unicode Public that this thread is moved here. On 04/09/18 20:10 I wrote: To: "Janusz S. Bie?" , "James Kass" Cc: "Philippe Verdy" Subject: [OFF LIST] Re: CLDR > > On 04/09/18 11:11 James Kass via Unicode wrote: > > (This is the response from Janusz S. Bie? which was sent to the public list.) > > Thank you James for forwarding. I?m responding off-list as I?m afraid that our discussions > might not be welcome on the List. [?] [Deleted for being off-topic.] > > > > On Mon, Sep 03 2018 at 1:03 -0800, James Kass wrote: > > > > > Janusz S. Bie? wrote, > > > > [?] > > Thanks! Most data about Poland at > > > > https://www.wikidata.org/wiki/Q36 > > > > seem to make sense, but I don't think anybody is using abbreviation like > > "plpm" (for Pomorze/Pomerania). > > We can see that part of those codes, for whatever items (regions, languages, scripts) > are counter-intuitive, and I don?t know neither who is using them in running text. > > > > [?] > > I hope not all CLDR data are driven by Wikidata... > > I was surprised to learn that even more data is imported without review, but Wikidata is > clearly a more reliable source than ISO 639, that is used without assessing its accuracy. > > > > > On Mon, Sep 03 2018 at 12:28 +0200, Marcel Schneider wrote: > [?] > > > Then I?m sorry to be off-topic. > > > > Let's say off the original topic. My primary concern is to preserve > > somehow such comments as e.g. the one on the bottom of page 14 of > > > > https://folk.uib.no/hnooh/mufi/specs/MUFI-CodeChart-4-0.pdf > > Normally this Medieval Latin semicolon abbreviation should be encoded in Unicode, which > contains already many duplicates of punctuation marks, and we know that a punctuation can > *never* represent a letter without running into issues. > > > > [?] > > > I?m volunteering to personally welcome you to contribute to CLDR. > > > > Thanks. The interesting question is who is/was already contributing from > > Poland or about Polish language. I vaguely remember a post with this > > information, but at that time I was not interested enough to take a > > note. > > I must confess that I wasn?t interested neither, or better, I wasn?t aware that I?m to contribute, > and perhaps was unable to do so. Normally the vendors, especially Apple and Google, should > be well-funded enough to be able to appoint as many specialists as needed. But it turns out > that when paying contractors, they are so greedy that the linguists are granted insufficient > worktime, eg a certain number of hours, without their managers assessing beforehand what > is the status of the data and how much work is needed to fix it, and consequently not able > to renegotiate the service provider contract. That operating mode is completely unresponsive > on part of Apple and Google, and Microsoft alike (although they have less money to devote). > > > > [?] > > > Polish has > > > consistently with By-Type, these quotation marks: > > > ' " ? ? ? ? > > > Hence the set is incomplete. > > > > You are right, thanks. But was is the practical importance of it? > > The importance of CLDR data being accurate is that having them otherwise would > reflect badly on the image of a country as being unable or careless. > > On a general level, another impact of having accurate locale data in CLDR is that > the repository gets a better reputation. As long as the data is unreliable, nobody > might actually use it. > > Yet another implication of the presence of a character in CLDR is being a good > argument for having it on the keyboard layout. Eg the Breton letter apostrophe is > not yet on the Breton keyboard layout, despite the issue having been discussed > on bug tracking / feature request level for XKB. So I informed [?] [Deleted for privacy.] > > > I noticed that sometimes in Emacs 'forward-word" behaves strangely on a > > text with unusual characters, but had no motivation to investigate how > > this is related to the current locale. > > I?m sorry to be unable to check this, as I?m not yet using Emacs, nor Vim. > > [?] > > > > The standard keyboard has a limiting number of keys, so you have to make > > compromises. It is generally accepted that Polish keyboard layouts > > (there are primarily two of them) does not contain apostrophe or single > > quotations marks. There is a proposal by Marcin Woli?ski > > > > http://marcinwolinski.pl/keyboard/ > > > > which is available in most Linux distributions but it does not seem > > popular. > > It has even been ported to Windows. But I cannot find it on Ubuntu 16.04. > It has various drawbacks, the worst of which is that the most common > angle quotation marks ?? are on Shift+AltGr level, while the single ones ?? > are on AltGr, and likewise for the curly quotes, of which Polish currently > uses the double ones, whereas the single ones appear to be used only > for nested quotations. This swapping frequent punctuation and rare punctuation > has been done only for consistency with the ASCII apostrophe being in the > Base shift state, and the ASCII double quote in the Shift shift state as on > US-QWERTY. > > That?s how mnemonics and a certain idea of logic are destroying usability. > > Thanks for the link anyway. > > Best regards, > > Marcel From cldr-users at unicode.org Wed Sep 5 07:56:05 2018 From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users) Date: Wed, 5 Sep 2018 14:56:05 +0200 Subject: CLDR survey / Polish keyboard (was: Re: CLDR) In-Reply-To: <64792931.11196.1536112976805.JavaMail.www@wwinf1h11> References: <64792931.11196.1536112976805.JavaMail.www@wwinf1h11> Message-ID: The email isn't on a single topic, so I just skimmed. Some quick remarks: > I hope not all CLDR data are driven by Wikidata... The Wikidata names are only used for subdivisions, and then only for ones that are "new" (where there were no preexisting names). The names are currently not visible via the Survey tool, and thus need modification via tickets. The reason not to show them in the ST is that it would load the tool down further and burden the vetters (tripling the number of fields). > using abbreviation like "plpm" That isn't an abbreviation, it is a code for a subdivision. Corresponds to the ISO 3166-2 code PL-PM > they are so greedy Ad hominem or (ad societatem) remarks are rarely productive, and rarely an accurate reflection of reality; one reason I seldom look at unicode at unicode.org. Mark On Wed, Sep 5, 2018 at 4:03 AM Marcel Schneider via CLDR-Users < cldr-users at unicode.org> wrote: > I?m taking this from Unicode Public mailing list, as the topics belong > here. > Though I already responded off-list and would prefer stepping out, I?m > afraid > that at least the CLDR part could be really useful in fighting certain > baseline > problems I encountered while being given the opportunity to participate in > surveying fr-FR locale data for the on-coming v34. Hence I feel committed > to respond ?on the record? and reopen the door for eventual follow-up, if > ever > I could have seemed to close it. > > Indeed while there were many errors and flaws in the data, most covetters > ended up lacking time to completely review all the items, despite doing a > really great job while devoting many hours to these tasks. After not > trying > to dig deeper so I would have learned what are the issues beneath, I now > simply speculated on my own about what might have triggered the problems > in reviewing data and ensuring quality. > > The goal is to make CLDR data more reliable, and to suggest what vendors > might wish to do for that purpose. > > At top of the below I?ve cut off a snippet unrelated to these topics, and > further, > a snippet for privacy. The slightly blunter off-list wording by contrast > has not > been redacted. > I?ll advise Unicode Public that this thread is moved here. > > On 04/09/18 20:10 I wrote: > To: "Janusz S. Bie?" , "James Kass" > Cc: "Philippe Verdy" > Subject: [OFF LIST] Re: CLDR > > > > On 04/09/18 11:11 James Kass via Unicode wrote: > > > (This is the response from Janusz S. Bie? which was sent to the public > list.) > > > > Thank you James for forwarding. I?m responding off-list as I?m afraid > that our discussions > > might not be welcome on the List. [?] > [Deleted for being off-topic.] > > > > > > On Mon, Sep 03 2018 at 1:03 -0800, James Kass wrote: > > > > > > > Janusz S. Bie? wrote, > > > > > > [?] > > > Thanks! Most data about Poland at > > > > > > https://www.wikidata.org/wiki/Q36 > > > > > > seem to make sense, but I don't think anybody is using abbreviation > like > > > "plpm" (for Pomorze/Pomerania). > > > > We can see that part of those codes, for whatever items (regions, > languages, scripts) > > are counter-intuitive, and I don?t know neither who is using them in > running text. > > > > > > > [?] > > > I hope not all CLDR data are driven by Wikidata... > > > > I was surprised to learn that even more data is imported without review, > but Wikidata is > > clearly a more reliable source than ISO 639, that is used without > assessing its accuracy. > > > > > > > > On Mon, Sep 03 2018 at 12:28 +0200, Marcel Schneider wrote: > > [?] > > > > Then I?m sorry to be off-topic. > > > > > > Let's say off the original topic. My primary concern is to preserve > > > somehow such comments as e.g. the one on the bottom of page 14 of > > > > > > https://folk.uib.no/hnooh/mufi/specs/MUFI-CodeChart-4-0.pdf > > > > Normally this Medieval Latin semicolon abbreviation should be encoded in > Unicode, which > > contains already many duplicates of punctuation marks, and we know that > a punctuation can > > *never* represent a letter without running into issues. > > > > > > > [?] > > > > I?m volunteering to personally welcome you to contribute to CLDR. > > > > > > Thanks. The interesting question is who is/was already contributing > from > > > Poland or about Polish language. I vaguely remember a post with this > > > information, but at that time I was not interested enough to take a > > > note. > > > > I must confess that I wasn?t interested neither, or better, I wasn?t > aware that I?m to contribute, > > and perhaps was unable to do so. Normally the vendors, especially Apple > and Google, should > > be well-funded enough to be able to appoint as many specialists as > needed. But it turns out > > that when paying contractors, they are so greedy that the linguists are > granted insufficient > > worktime, eg a certain number of hours, without their managers assessing > beforehand what > > is the status of the data and how much work is needed to fix it, and > consequently not able > > to renegotiate the service provider contract. That operating mode is > completely unresponsive > > on part of Apple and Google, and Microsoft alike (although they have > less money to devote). > > > > > > > [?] > > > > Polish has > > > > consistently with By-Type, these quotation marks: > > > > ' " ? ? ? ? > > > > Hence the set is incomplete. > > > > > > You are right, thanks. But was is the practical importance of it? > > > > The importance of CLDR data being accurate is that having them otherwise > would > > reflect badly on the image of a country as being unable or careless. > > > > On a general level, another impact of having accurate locale data in > CLDR is that > > the repository gets a better reputation. As long as the data is > unreliable, nobody > > might actually use it. > > > > Yet another implication of the presence of a character in CLDR is being > a good > > argument for having it on the keyboard layout. Eg the Breton letter > apostrophe is > > not yet on the Breton keyboard layout, despite the issue having been > discussed > > on bug tracking / feature request level for XKB. So I informed [?] > [Deleted for privacy.] > > > > > I noticed that sometimes in Emacs 'forward-word" behaves strangely on a > > > text with unusual characters, but had no motivation to investigate how > > > this is related to the current locale. > > > > I?m sorry to be unable to check this, as I?m not yet using Emacs, nor > Vim. > > > > [?] > > > > > > The standard keyboard has a limiting number of keys, so you have to > make > > > compromises. It is generally accepted that Polish keyboard layouts > > > (there are primarily two of them) does not contain apostrophe or single > > > quotations marks. There is a proposal by Marcin Woli?ski > > > > > > http://marcinwolinski.pl/keyboard/ > > > > > > which is available in most Linux distributions but it does not seem > > > popular. > > > > It has even been ported to Windows. But I cannot find it on Ubuntu 16.04. > > It has various drawbacks, the worst of which is that the most common > > angle quotation marks ?? are on Shift+AltGr level, while the single ones > ?? > > are on AltGr, and likewise for the curly quotes, of which Polish > currently > > uses the double ones, whereas the single ones appear to be used only > > for nested quotations. This swapping frequent punctuation and rare > punctuation > > has been done only for consistency with the ASCII apostrophe being in > the > > Base shift state, and the ASCII double quote in the Shift shift state as > on > > US-QWERTY. > > > > That?s how mnemonics and a certain idea of logic are destroying > usability. > > > > Thanks for the link anyway. > > > > Best regards, > > > > Marcel > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Wed Sep 5 11:22:55 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Wed, 5 Sep 2018 18:22:55 +0200 (CEST) Subject: CLDR survey / Polish keyboard (was: Re: CLDR) Message-ID: <954616513.15222.1536164575599.JavaMail.www@wwinf1m11> On 05/09/18 14:59 Mark Davis ?? via CLDR-Users wrote: > [?] > they are so greedy > > Ad hominem or (ad societatem) remarks are rarely productive, and rarely an accurate reflection of reality; That is the one phrase I?d redacted first if I was to remove off-list-style shorthand topoi. I was afraid that it could hurt when posted here, while I only wished to make aware of the way management decisions may end up reflecting badly on whatever corporate image. The idea is that CLDR data shouldn?t be to wait for a volunteer coming along to correct. Rather the process should be set up in a way it succeeds in say 2 years from scratch. Now we?re to determine whether the (human and financial) effort implied is not considered worthwile. That could be because end-users getting inaccurate data displayed are not deemed to pay attention; or because public language offices are the premium contributers expected, and vendors are only helping out at failure. [Here I?m censoring myself so as not to get ad corpus again, nor ad hominem as I did necessarily off-list when giving details about a contact with a language office.] Perhaps the most useful thing would be to simply send e-mails to vendors asking them to devote more means to CLDR survey, making aware that the data isn?t meeting obvious quality standards. Is it naive to believe that an e-mail to this or the other list may suffice for that purpose? > one reason I seldom look at unicode at unicode.org. I publicly apologize for any ad hominem comment I?d ever posted on a list. I sincerely regret not to stay technical, having trouble depersonalizing human affairs. I?m always at risk of getting off the road while trying to understand and to figure out how and by whom problems could be fixed. Perhaps I shouldn?t focus on that. Probably I?d better just do the job as it lies out. Eg when a correctly spelled name was suddenly misspelled despite a vetter hinting that the name was correct, there would be no point in finding out how that could happen, but only in correcting the error (two years later). But the evidence is that such things can happen only because vetters are not given enough time to assess a spelling as accurate. Eg by that time it was already sufficient to look up the proposed spelling in French Wikip?dia, for getting a sentence in the first place explaining why that spelling does not apply. Even now, a number of errors remained uncorrected because vetters did not have enough worktime while survey was plain open. I myself ended up cutting down CLDR survey time while corrections didn?t get an echo and it was unclear whether they were useful, and this way left typos I?d made in ST. When vetting phase was on, everybody did a great job but it was too late to correct the typos, given ST is partly read-only then, which from my point of view is not good. But no matter, I?d made and left the typos. Best regards, Marcel From cldr-users at unicode.org Wed Sep 5 14:03:28 2018 From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users) Date: Wed, 5 Sep 2018 21:03:28 +0200 Subject: CLDR survey / Polish keyboard In-Reply-To: <86zhwwhvpm.fsf@mimuw.edu.pl> References: <64792931.11196.1536112976805.JavaMail.www@wwinf1h11> <86zhwwhvpm.fsf@mimuw.edu.pl> Message-ID: This one would be more useful if it were more complete, and the data were managed better. But there are far less useful ISO standards! Mark On Wed, Sep 5, 2018 at 4:20 PM Janusz S. Bie? wrote: > On Wed, Sep 05 2018 at 14:56 +0200, Mark Davis ?? wrote: > > The email isn't on a single topic, so I just skimmed. Some quick remarks: > > > >> I hope not all CLDR data are driven by Wikidata... > > The Wikidata names are only used for subdivisions, and then only for > > ones that are "new" (where there were no preexisting names). The names > > are currently not visible via the Survey tool, and thus need > > modification via tickets. The reason not to show them in the ST is > > that it would load the tool down further and burden the vetters > > (tripling the number of fields). > > > >> using abbreviation like "plpm" > > > > That isn't an abbreviation, it is a code for a > > subdivision. Corresponds to the ISO 3166-2 code PL-PM > > Thanks for explanation. I found the probably full list of codes for > Poland here: > > https://pl.wikipedia.org/wiki/ISO_3166-2:PL > > Still in doubt whether the codes are of any use, but we have to live > with it. Some time ago, as a member of a technical committee of the > Polish Committee for Standardization I tried to block an ISO standard of > no practical use, and it appeared completely impossible... > > Best regards > > Janusz > > -- > , > Janusz S. Bien > emeryt (emeritus) > https://sites.google.com/view/jsbien > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Fri Sep 7 19:49:26 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Sat, 8 Sep 2018 02:49:26 +0200 (CEST) Subject: Shortcuts question (Re) Message-ID: <1879781969.16428.1536367766562.JavaMail.www@wwinf1m21> Hello, There is a short thread about localizing keyboard shortcuts, on Unicode Public: https://unicode.org/mail-arch/unicode-ml/y2018-m09/0018.html https://unicode.org/mail-arch/unicode-ml/y2018-m09/0019.html https://unicode.org/mail-arch/unicode-ml/y2018-m09/0021.html On Fri, 7 Sep 2018 05:52:46 +0530 Shriramana Sharma via Unicode wrote: [?] > 1) Are shortcuts like Ctrl+C changed as per locale? I mean Ctrl+T for "tout" io Ctrl+A for "all"? On Fri, 7 Sep 2018 05:27:08 +0200 I via Unicode wrote: > No, Ctrl+A remains Ctrl+A on a French keyboard. On Fri, 7 Sep 2018 15:03:46 +0200 Christoph P?per via Unicode wrote: [?] > Some are, many are not. For instance, some text editors use a modifier key with F and K > instead of B and I for bold ("fett") and italic ("kursiv"). Indeed in French edition of Excel Starter, bold is Ctrl+G (for ?gras?), while Word Starter (as part of the same Office Starter) has it Ctrl+B. For follow-up, here is OP?s full request: On Fri, 7 Sep 2018 05:52:46 +0530 Shriramana Sharma via Unicode wrote: Hello. This may be slightly OT for this list but I'm asking it here as it concerns computer usage with multiple scripts and i18n: 1) Are shortcuts like Ctrl+C changed as per locale? I mean Ctrl+T for "tout" io Ctrl+A for "all"? 2) How about when the shortcuts are the Alt+ combinations referring to underlined letters in actual user visible strings? 3) In a QWERTZ layout for Undo should one still press the (dislocated wrt the other XCV shortcuts) Z key or the Y key which is in the physical position of the QWERTY Z key (and close to the other XCV shortcuts)? 4) How are shortcuts handled in the case of non Latin keyboards like Cyrillic or Japanese? 4a) I mean how are they displayed on screen? 4b) Like #1 above, are they changed per language? 4c) Like #2 above, how about for user visible shortcuts? (In India since English is an associate official language, most computer users are at least conversant with basic English so we use the English/QWERTY shortcuts even if the keyboard physically shows an Indic script.) Thanks! From cldr-users at unicode.org Thu Sep 13 13:57:39 2018 From: cldr-users at unicode.org (Peter Edberg via CLDR-Users) Date: Thu, 13 Sep 2018 11:57:39 -0700 Subject: Unicode CLDR 34 alpha available for testing Message-ID: <193FF3F2-8F69-496D-AD2E-8A558B0731EA@unicode.org> The alpha version of Unicode CLDR 34 is available for testing. The alpha period lasts until the beta release on September 26, which will include updates to the LDML spec. The final release is expected on October 10. CLDR 34 provides an update to the key building blocks for software supporting the world's languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks. CLDR 34 included a full Survey Tool data collection phase. Other enhancements include several changes to prepare for the new Japanese calendar era starting 2018-05-01; updated emoji names, annotations, collation and grouping; and other specific fixes. The draft release page at http://cldr.unicode.org/index/downloads/cldr-3 4 lists the major features, and has pointers to the newest data and charts. It will be fleshed out over the coming weeks with more details, migration issues, known problems, and so on. Particularly useful for review are: Delta Charts - the data that changed during the release By-Type Charts - a side-by-side comparison of data from different locales Annotation Charts - new emoji names and keywords Please report any problems that you find using a CLDR ticket . We'd also appreciate it if programmatic users of CLDR data download the xml files and do a trial integration to see if any problems arise. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Thu Sep 13 14:36:16 2018 From: cldr-users at unicode.org (Peter Edberg via CLDR-Users) Date: Thu, 13 Sep 2018 12:36:16 -0700 Subject: Unicode CLDR 34 alpha available for testing In-Reply-To: <193FF3F2-8F69-496D-AD2E-8A558B0731EA@unicode.org> References: <193FF3F2-8F69-496D-AD2E-8A558B0731EA@unicode.org> Message-ID: <9CC2B93E-8A6A-4AAB-8420-C94679CB5F59@unicode.org> > On Sep 13, 2018, at 11:57 AM, Peter Edberg via Unicore wrote: > > The alpha version of Unicode CLDR 34 is available for testing. The alpha period lasts until the beta release on September 26, which will include updates to the LDML spec. The final release is expected on October 10. > > CLDR 34 provides an update to the key building blocks for software supporting the world's languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks. > > CLDR 34 included a full Survey Tool data collection phase. Other enhancements include several changes to prepare for the new Japanese calendar era starting 2018-05-01; > (that of course should have read 2019-05-01, sorry) > updated emoji names, annotations, collation and grouping; and other specific fixes. The draft release page at http://cldr.unicode.org/index/downloads/cldr-3 4 lists the major features, and has pointers to the newest data and charts. It will be fleshed out over the coming weeks with more details, migration issues, known problems, and so on. Particularly useful for review are: > > Delta Charts - the data that changed during the release > By-Type Charts - a side-by-side comparison of data from different locales > Annotation Charts - new emoji names and keywords > Please report any problems that you find using a CLDR ticket . We'd also appreciate it if programmatic users of CLDR data download the xml files and do a trial integration to see if any problems arise. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Thu Sep 13 19:22:53 2018 From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users) Date: Thu, 13 Sep 2018 17:22:53 -0700 Subject: Unicode CLDR 34 alpha available for testing In-Reply-To: <193FF3F2-8F69-496D-AD2E-8A558B0731EA@unicode.org> References: <193FF3F2-8F69-496D-AD2E-8A558B0731EA@unicode.org> Message-ID: I touched up the release page, adding some of your wording. See how it looks: http://cldr.unicode.org/index/downloads/cldr-34 Mark On Thu, Sep 13, 2018 at 11:58 AM Peter Edberg via CLDR-Users < cldr-users at unicode.org> wrote: > The alpha version of Unicode CLDR 34 > is available for > testing. The alpha period lasts until the beta release on September 26, > which will include updates to the LDML spec. The final release is expected > on October 10. > > CLDR 34 provides an update to the key building blocks for software > supporting the world's languages. This data is used by all major software > systems for their > software internationalization and localization, adapting software to the > conventions of different languages for such common software tasks. > > CLDR 34 included a full Survey Tool data collection phase. Other > enhancements include several changes to prepare for the new Japanese > calendar era starting 2018-05-01; updated emoji names, annotations, > collation and grouping; and other specific fixes. The draft release page at > > http://cldr.unicode.org/index/downloads/cldr-3 > 4 lists the major > features, and has pointers to the newest data and charts. It will be > fleshed out over the coming weeks with more details, migration issues, > known problems, and so on. Particularly useful for review are: > > - Delta Charts - > the data that changed during the release > - By-Type Charts - > a side-by-side comparison of data from different locales > - Annotation Charts > - new emoji > names and keywords > > Please report any problems that you find using a CLDR ticket > . We'd also appreciate it if > programmatic users of CLDR data download the xml files and do a trial > integration to see if any problems arise. > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Sun Sep 16 16:42:53 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Sun, 16 Sep 2018 23:42:53 +0200 (CEST) Subject: Shortcuts question Message-ID: <1968466874.9623.1537134173094.JavaMail.www@wwinf1m12> On 16/09/18 15:28, Philippe Verdy wrote on Unicode Public Mail List: [?] > On PC keyboards, ShiftLock does not apply to the numeric pad which has its separate NumLock, now largely redundant > and that most users would like to disable completely each time there's a numeric pad separated from the directional pad, > on these extended keyboards, NumLock is just a nuisance, notably on OS logon screen when Windows turns it off by default > unless the BIOS locks it at boot time, and lot of BIOS don't do that or don't have the option to set it permanently). Legacy NumLock can be permanently disabled on a per-layout basis by hard-coding additional defines in the header file, given that since a long time, arrow keys are present throughout, while the numpad is either separated, or integrated, or missing, and may be external. But a number of laptops having integrated numpad (on alphanumeric keys on and beneath 7 8 9 0) are using NumLock as a combined legacy NumLock and Fn-Lock-on-Numpad. Here disabling the legacy part is particularly useful, as this does not affect the Fn-Lock-on-Numpad functionality. The result is alternative access to integrated numpad digits either by holding down Fn, or by activating the NumLock toggle. Subscribers interested in details may wish to follow up off-list with us. Further, on 16/09/18 14:18, I wrote on Unicode Public: [?] > But again that is easier on Windows, where VKs are remapped separately, than on Linux that > appears to use graphics throughout to process application shortcuts, and only modifiers can be "preserved" for > further processing, no underlying letter map that AFAIU appears not to exist on Linux. I was wrong. Linux allows to map Control modifier combinations to letters eg on levels 7 and 8 while directing XKB to preserve the modifiers, enabling Linux to have keyboard shortcuts moving around independently from default resolution (that uses letter mapping on Latin layouts, while other scripts appear to benefit from an internal QWERTY mapping). Again, people interested in how to code that are welcome to follow up off-list. Regards, Marcel From cldr-users at unicode.org Mon Sep 17 09:58:43 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Mon, 17 Sep 2018 16:58:43 +0200 (CEST) Subject: Group separator migration from U+00A0 to U+202F Message-ID: <1724680793.12587.1537196323824.JavaMail.www@wwinf1m12> To be cost-effective, all locales using space as numbers group separator should migrate at once from the wrong U+00A0 to the correct U+202F. I didn?t aim at making French stand out, but at correcting an error in CLDR. Having even the Canadian French sublocale stick with the wrong value makes no sense and is mainly due to opaque inheritance relationships and to severe constraints on vetters applying for fr-FR and subsequently reduced to look on helpless from the sidelines when sublocales are not getting fixed. http://cldr.unicode.org/index/downloads/cldr-34#TOC-Migration https://unicode.org/cldr/trac/ticket/11423 Regards, Marcel From cldr-users at unicode.org Tue Sep 18 00:17:50 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Tue, 18 Sep 2018 07:17:50 +0200 (CEST) Subject: Group separator migration from U+00A0 to U+202F Message-ID: <1696775949.245.1537247870482.JavaMail.www@wwinf2219> > I didn?t aim at making French stand out, but at correcting an error in CLDR. So I've to confess that I did focus on French and only applied for fr-FR, but there was a lot of work, see http://cldr.unicode.org/index/downloads/cldr-34#TOC-Growth waiting for very few vetters. Nevertheless I also cared for English (see various tickets), and also posted on CLDR-users in a belated P.S. that fr-CA hadn?t caught up the group separator correction yet: https://unicode.org/pipermail/cldr-users/2018-August/000825.html Also I?m sorry for failing to provide appropriate feedback after beta release and to post upstream messages urging to make sure all locales using space for group separator be kept in synchrony. I think the point about not splitting up all the data into locales is a very good one. There should be a common pool so that all locales using Arabic script have automatically group separator set to ARABIC THOUSANDS SEPARATOR (provided it actually fits all), and those locales using space should only need to specify "space" to automatically get the correct one, ie NARROW NO-BREAK SPACE as soon as Unicode is ready to give it currency in that role. Also there is a display issue in the charts, where whitespaces show up as what they are: blanks, regardless whether they are wide or narrow, justifying or fixed-width. Non-breaking behavior may be induced from context, but we see that other correct behavior cannot be induced from context, given numbers were supposed to be grouped using a justifying space, so that it only works halfway where justification is turned off (eg in Wikipedia). I?m posting here thinking at people not monitoring Trac: https://unicode.org/cldr/trac/ticket/11423#comment:2 Regards, Marcel From cldr-users at unicode.org Fri Sep 21 19:34:27 2018 From: cldr-users at unicode.org (Steven R. Loomis via CLDR-Users) Date: Fri, 21 Sep 2018 17:34:27 -0700 Subject: Locale bringup and barriers for entry Message-ID: Hello, and welcome to the new cldr-users members. For discussion: At the IUC conference last week, a few of us discussed around lunch some issues around getting new locales into CLDR, and barriers to entry. Barriers: - we discussed that it could be confusing or difficult to collect all of the data needed for a minimal locale: http://cldr.unicode.org/index/cldr-spec/minimaldata - especially pluralization data - what about fonts? keyboards? - what are the best ways to coordinate efforts between the language users and different technical experts? Ideas: - a web app to take in new locale data? - a web app to debug/explore plurals? - allowing some locales to 'get started' without plural rules? Links for discussion: - Elnaz and Steven's prez from (last) Monday: https://goo.gl/sN7biw - My "full stack" blog post: https://srl295.github.io/2017/06/06/full-stack-enablement/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Sat Sep 22 03:15:09 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Sat, 22 Sep 2018 10:15:09 +0200 (CEST) Subject: Locale bringup and barriers for entry Message-ID: <695051747.1133.1537604109774.JavaMail.www@wwinf2227> Thank you Steven for sharing these useful resources and for the effort you and others undertook in vulgarizing some insights about what is CLDR, what is locale data, and how to bring these together. ? To start discussion, here are a few thoughts crossing my mind based on experience of past survey round: On 22/09/18 02:37 Steven R. Loomis via CLDR-Users wrote: > > Hello, and welcome to the new cldr-users members. Thanks. > For discussion: > > At the IUC conference last week, a few of us discussed around lunch some issues around getting new locales into CLDR, and barriers to entry. > Barriers: > - we discussed that it could be confusing or difficult to collect all of the data needed for a minimal locale: Some main sources of confusion seem to me: 1. The English template may not be internally consistent, eg emoji category names may be singular or plural (plural throughout seems correct); 2. The English template may not be up-to-date, eg. still including ASCII quotes in exemplar punctuation though these have been ruled out; 3. The target data sets may not be comprehensively specified, eg the define of exemplar punctuation does include an exclusion clause for math ?????symbols only, while the clause about not including symbols on a programmatic usage basis such as # @ _ is still missing; 4. The English template may not be kept in synchrony with the specifications, eg emoji keywords not to include emoji name or name starter; 5. Numerous bugs affecting markup of inherited values (but these have been reported and are about to be fixed in the SurveyTool code). >?http://cldr.unicode.org/index/cldr-spec/minimaldata - especially pluralization data The scope of pluralization seems unclear and biased by the English paradigm of genderlessness, while in other languages grammatical gender is a determining parameter for pluralization, so that even extensions to the DTD seem to be required for providing out-of-the-box pluralization rules. > - what about fonts? Invisibles and confusables should be visualized and distinguished throughout, ie both in SurveyTool and in Charts. While SurveyTool already shows U+200E LEFT-TO-RIGHT MARK and U+200F RIGHT-TO-LEFT MARK, confusables like spaces and apostrophes are still hard or not to distinguish. That?s in the nature of the related charactes, eg U+00A0 NO-BREAK SPACE is defined as being like U+0020 SPACE except for line-break behavior, and the preferred glyph of U+02BC MODIFIER LETTER APOSTROPHE is the same as that of U+2019 RIGHT SINGLE QUOTATION MARK. > keyboards? I see fonts and keyboards actually as the two missing components of the stack that you designed, because though being part of locale data, input methods are a precondition of efficient submission of locale data. The full stack would thus expand to: 1. Encoding 2. Fonts 3. Input methods 4. Locale data > - what are the best ways to coordinate efforts between the language users and different technical experts? > Ideas: > - a web app to take in new locale data? I think CLDR has already its web app, ie SurveyTool. A full-time engineer is actually redeveloping and debugging several or all parts of it. > - a web app to debug/explore plurals? Before including this functionality in SurveyTool, where it belongs in, I think that the spec should be redesigned, and the documentation updated accordingly. That could eventually result in extended language support by CLDR/ICU, which would do no harm but only raise the product value. > - allowing some locales to 'get started' without plural rules? I think that any locale may get started in CLDR when providing date and time formats, while correctly displaying a reminder of a shopping cart may be left over for a later stage. > Links for discussion: > - Elnaz and Steven's prez from (last) Monday:?https://goo.gl/sN7biw > - My "full stack" blog post:?https://srl295.github.io/2017/06/06/full-stack-enablement/ Thanks. Have read and discussed following the hints you provided. Regards, Marcel _______________________________________________ CLDR-Users mailing list CLDR-Users at unicode.org http://unicode.org/mailman/listinfo/cldr-users From cldr-users at unicode.org Sat Sep 22 06:07:29 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Sat, 22 Sep 2018 13:07:29 +0200 (CEST) Subject: Locale bringup and barriers for entry Message-ID: <1129835762.2183.1537614449027.JavaMail.www@wwinf2227> I didn?t aim at doing what I?ve ended up doing, ie summing up a bunch of tickets already under process in a thread launched to welcome newcomers and showing ways of expanding CLDR support to all of the world?s locales. Indeed over the details I forgot my first thoughts: On 22/09/18 02:37 Steven R. Loomis via CLDR-Users wrote: [?] > - what are the best ways to coordinate efforts between the language users and different technical experts? I can only encourage everyone to first make up our minds individually by taking a close look at the latest Charts, especially ? as of learning how to include *new* locales ? at the By-Type overviews of the set of locales that have already had the chance of making it into CLDR: http://cldr.unicode.org/index/downloads http://www.unicode.org/cldr/charts/latest/ https://www.unicode.org/cldr/charts/latest/by_type/index.html https://www.unicode.org/cldr/charts/latest/by_type/core_data.alphabetic_information.html https://www.unicode.org/cldr/charts/latest/by_type/core_data.alphabetic_information.main.html https://www.unicode.org/cldr/charts/latest/by_type/core_data.alphabetic_information.punctuation.html and so on. Another important step is to read through the Information Hub for Linguists, the main documentation resource: http://cldr.unicode.org/translation from where we can access the detailed pages linked also from the information pane in SurveyTool. Eg about plurals: http://cldr.unicode.org/translation/plurals I happened to start uninformed discussions prior to noticing that the documentation already provided sufficient instructions, or prior to sorting out what was already covered or what clarifications I needed? A good way to prepare ? if not already done ? is also to learn XML and more specifically LDML, the Unicode Locale Data Markup Language, in order to be able to read and submit data in that format: http://cldr.unicode.org/index/cldr-spec linking: http://www.unicode.org/reports/tr35/ Eg to understand how inheritance works: http://www.unicode.org/reports/tr35/#Locale_Inheritance That is key knowledge to understand what happens to us when working in SurveyTool, and to detect eventual inheritance display bugs ? unlikely to happen anymore, though. Now we?re ready for a take on the raw data, as downloaded or found in the online repository: http://www.unicode.org/repos/cldr/tags/latest/ https://www.unicode.org/repos/cldr/tags/latest/common/ https://www.unicode.org/repos/cldr/tags/latest/common/main/ where we may wish to pick the locale that is closest to our new data, or that we know best among the precursors, or simply English for reference: https://www.unicode.org/repos/cldr/tags/latest/common/main/en.xml (Emoji-related data are in a separate repository: https://www.unicode.org/repos/cldr/tags/latest/common/annotations/en.xml ) I think best is to download a whole set of data in a zipped folder?; latest as of now are in: http://www.unicode.org/Public/cldr/33.1/ and then open relevant files in a text editor with syntaxic highlighting and XML syntax checker. Here?s finally my answer to the quoted question about how to coordinate efforts between users and experts: All interested people may communicate by any available means all over the year, given SurveyTool fora have limited access and accept posts only during surveys, while being read-only for accredited people the rest of the time. Likewise, SurveyTool submission forms are read-only except during relatively short windows of opportunity extending over 4..7 weeks two times a year. Results of discussions may then be committed to a file in LDML/XML format. The easiest way is to take the English files, cut off eventually unreviewed parts, and replace English content with locale content. The resulting files may then be submitted individually by each coordinated vetter using the SurveyTool bulk data upload feature: http://cldr.unicode.org/index/survey-tool http://cldr.unicode.org/index/survey-tool/guide http://cldr.unicode.org/index/survey-tool/guide#TOC-Advanced-Features http://cldr.unicode.org/index/survey-tool/upload I think we?ll look whether we?ll try this out for French / fr-FR when the next rush starts on December 1??. Good luck! Marcel From cldr-users at unicode.org Sat Sep 22 10:17:24 2018 From: cldr-users at unicode.org (Philippe Verdy via CLDR-Users) Date: Sat, 22 Sep 2018 17:17:24 +0200 Subject: Locale bringup and barriers for entry In-Reply-To: <1129835762.2183.1537614449027.JavaMail.www@wwinf2227> References: <1129835762.2183.1537614449027.JavaMail.www@wwinf2227> Message-ID: Le sam. 22 sept. 2018 ? 13:10, Marcel Schneider via CLDR-Users < cldr-users at unicode.org> a ?crit : > A good way to prepare ? if not already done ? is also to learn XML and > more specifically LDML, the > Unicode Locale Data Markup Language, in order to be able to read and > submit data in that format: > > http://cldr.unicode.org/index/cldr-spec But the CLDR Survey does not allow us to participate directly by submitting LDML data (as submitted but unvetted provisional data) and then merging them in our votes. It would considerably speedup the data submission (I also think that such submissions should allow us to include some custom "hashtag", that are usable in a search form in CLDR survey, so that we can group related items together: we could have multiple hashtags, one for each property we want to track in alternative groups, because these groups do not necessarily form a partition of the data space, they are not orthogonal). Adding hashtags would also be possible in LDML for the whole LDML file or parts of it. basicacally they would have the syntax of a space separated list of keywords, themsevles not translated but used symbolically, and using preferably a naming convention (CLDR admins could rename them in case of collision, but this would not change the naming; if hashtags come from user submission, they could be automatically prefixed by an identifier of that user or organisation, such as "x-24-space" when user number 24 submitted data with a custom tag "space"; but the CLDR admin team could create tags more freely without this prefix). Any data time in CLDR could have one or several tags. These tags by default would be visible only to that submitting user, separated from tags shared and exposed to others (so that we can separate custom groups created by users from groups to be reused by other people). This is much like tags used in GitHub to help sort and search through a long list of bug reports or RFEs. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Sat Sep 22 12:52:23 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Sat, 22 Sep 2018 19:52:23 +0200 (CEST) Subject: Locale bringup and barriers for entry Message-ID: <1498279448.5059.1537638744010.JavaMail.www@wwinf2227> On? 22/09/18 17:23 Philippe Verdy via CLDR-Users wrote: [quote] > But the CLDR Survey does not allow us to participate directly by submitting LDML data > (as submitted but unvetted provisional data) and then merging them in our votes. > It would considerably speedup the data submission Steven R. Loomis hinted this in response to a demand I?d posted on Trac: https://unicode.org/cldr/trac/ticket/11255#comment:2 > (I also think that such submissions should allow us to include some custom "hashtag", > that are usable in a search form in CLDR survey, so that we can group related items together: > we could have multiple hashtags, one for each property we want to track in alternative groups, > because these groups do not necessarily form a partition of the data space, they are not orthogonal). I don?t understand how SurveyTool would get these hashtags to work, but you may post this as a feature request. For newcomers, here is how to send feedback for processing by the CLDR Technical Committee: 1. Set your personal data in Preferences: https://unicode.org/cldr/trac/prefs 2. Submit any report, new data not having their locale ID in CLDR yet, feature requests: https://unicode.org/cldr/trac/newticket Don?t worry if you?re prompted to do some arithmetics. With your personal data in a cookie, that is less likely to happen. Make sure however not to exceed the maximum number of 5 external links per post. Internal links are unlimited, using ? ? ticket:123456 ? ? syntax or alternatives as shown in: https://unicode.org/cldr/trac/wiki/WikiFormatting What we can already do is to use XML comments in the files we?re working on, and there we may add hashtags. However SurveyTool won?t import them, only register our votes. Which is already a huge deal saving us much time. Regards, Marcel From cldr-users at unicode.org Sat Sep 22 14:53:56 2018 From: cldr-users at unicode.org (Philippe Verdy via CLDR-Users) Date: Sat, 22 Sep 2018 21:53:56 +0200 Subject: Locale bringup and barriers for entry In-Reply-To: <1498279448.5059.1537638744010.JavaMail.www@wwinf2227> References: <1498279448.5059.1537638744010.JavaMail.www@wwinf2227> Message-ID: My intent is to have those tags (that are internally numbered with a stable id but may be renamed by proposing users, or by admins if tags are made global and unprefixed) also usable in Survey discussions to attach a comment to all CLDR data entries using that have been tagged with it by the user (using a prefixed user tag) or by a global tag (created by the CLDR tech admins). I.e. reproduce what we can do easily in GutHub to track various related bugs reports and RFEs or pending actions. Ideally there should also be a graph of entries (these tags are working also like tasks in task lists, they have a status coming from the posted comments, which can be closed once solved) May be some integration with GitHub and community development tools or bug tracking tools would be useful, just like many opensource develomment projects (of which CLDR is one). That integration comes with URNs/URLs linked to CLDR data paths. For now CLDR submission is too much hierarchic, and does not focus very well on related groups of items that must be fixed together (in the same locale, or across locales), so it is very difficult to isolate the inconsistencies (and get reliable votes to fix them in the short time allowed for submission and vetting (notably because the tool is much too slow and uses really too much javascript/DOM resources in the browser, and is very unresponsive to user events, creating many unexpected actions, or ignored clicks). Le sam. 22 sept. 2018 ? 19:58, Marcel Schneider via CLDR-Users < cldr-users at unicode.org> a ?crit : > On 22/09/18 17:23 Philippe Verdy via CLDR-Users wrote: > [quote] > > But the CLDR Survey does not allow us to participate directly by > submitting LDML data > > (as submitted but unvetted provisional data) and then merging them in > our votes. > > It would considerably speedup the data submission > > Steven R. Loomis hinted this in response to a demand I?d posted on Trac: > > https://unicode.org/cldr/trac/ticket/11255#comment:2 > > > (I also think that such submissions should allow us to include some > custom "hashtag", > > that are usable in a search form in CLDR survey, so that we can group > related items together: > > we could have multiple hashtags, one for each property we want to track > in alternative groups, > > because these groups do not necessarily form a partition of the data > space, they are not orthogonal). > > I don?t understand how SurveyTool would get these hashtags to work, but > you may post this as a feature > request. For newcomers, here is how to send feedback for processing by the > CLDR Technical Committee: > > 1. Set your personal data in Preferences: > https://unicode.org/cldr/trac/prefs > > 2. Submit any report, new data not having their locale ID in CLDR yet, > feature requests: > https://unicode.org/cldr/trac/newticket > > Don?t worry if you?re prompted to do some arithmetics. With your personal > data in a cookie, that is less likely > to happen. Make sure however not to exceed the maximum number of 5 > external links per post. Internal links > are unlimited, using ticket:123456 syntax or alternatives as shown > in: > https://unicode.org/cldr/trac/wiki/WikiFormatting > > What we can already do is to use XML comments in the files we?re working > on, and there we may add hashtags. > However SurveyTool won?t import them, only register our votes. Which is > already a huge deal saving us much time. > > Regards, > > Marcel > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Sun Sep 23 13:29:25 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Sun, 23 Sep 2018 20:29:25 +0200 (CEST) Subject: Locale bringup and barriers for entry In-Reply-To: References: <1498279448.5059.1537638744010.JavaMail.www@wwinf2227> Message-ID: <1404261498.5171.1537727365472.JavaMail.www@wwinf2227> On 22/09/18 21:54 Philippe Verdy wrote: > > My intent is to have those tags (that are internally numbered with a stable id but may be renamed by proposing users, > or by admins if tags are made global and unprefixed) also usable in Survey discussions to attach a comment to all > CLDR data entries using that have been tagged with it by the user (using a prefixed user tag) or by a global tag > (created by the CLDR tech admins). Now I understand and believe it's very useful to prevent what we often observed or did, when vetters come up with a whole bunch of items having same issue, and cannot help posting one forum post per item, as there is no other way of getting the stuff show up in the information pane when one of these items has focus in SurveyTool. So we happened to copy-paste one single message and paste it as many times in the launch-new-thread form as there were items to fix. Downstream that triggers of course an avalanche of e-mail alerts from ST, which every vetter would then have to open one by one, only to read the same message x times. Therefore yes we should really have means of bundling items and discuss them together as a batch. However the issue lies in the process. As long as we vote items one by one in ST instead of preparing our votes in LDML format, we will stick with ST features that may or may not be present. And ST is far too less agile, since even when a patch is available, it isn?t applied until next ST overhaul prior to next vetting round, so that ST keeps tampering with peoples? work instead of being fixed over night, to see how it looks next day. CLDR should adapt to contributors? way of working, not impose their own rythm, because contributors may have other constraints and limited time. A pity that the Trac tool is not used enough. Perhaps vetters are not allowed to spend time writing or commenting bug reports, or are disallowed to post publicly, given Trac has unrestrained public access, whereas ST fora are closed up and can only be checked by people having credentials, so that locale data production is opaque and at the antipodes of what is current practice in open source projects. There are also technical issues with interoperability of SurveyTool and Trac. While links in Trac work fine, Trac refuses to publish more than five external links at once, which heavily impacts usability, given ST fora are AFAIK considered external by Trac spam bots. What bothered me badly is that anchors on ST fora pages don?t work precisely. Instead of scrolling where you copied an anchor link, ST forum scrolls elsewhere, so that I ended up always adding the datestamp for use in browser search. Well that should make for another bug ticket, but I currently cannot do. I hope TC monitoring this list will wish to pick this up for fixing. > I.e. reproduce what we can do easily in GutHub to track various related bugs reports and RFEs or pending actions. > Ideally there should also be a graph of entries (these tags are working also like tasks in task lists, they have a status > coming from the posted comments, which can be closed once solved) > May be some integration with GitHub and community development tools or bug tracking tools would be useful, > just like many opensource develomment projects (of which CLDR is one). > That integration comes with URNs/URLs linked to CLDR data paths. > For now CLDR submission is too much hierarchic, and does not focus very well on related groups of items > that must be fixed together (in the same locale, or across locales), so it is very difficult to isolate the inconsistencies > (and get reliable votes to fix them in the short time allowed for submission and vetting (notably because the tool is > much too slow and uses really too much java-script/DOM resources in the browser, and is very unresponsive to user events, > creating many unexpected actions, or ignored clicks). Yes indeed. People of different locales cannot interoperate well, and even between sublocales and root locales there is unresponsiveness. See how one French sublocale did not update the group separator. That is symptomatic of missing dynamics inside the CLDR community. Then also we do indeed loose time when slowed down, and an "Approve all" button ?s also missing, for use where most items are OK and only few or none to change votes after. This one has been posted: https://unicode.org/cldr/trac/ticket/11250#comment:1 For now I can only suggest to work offline and be ready to organize ourselves. I think that if being able to intersperse CLDR work sequenes all over the year, as suggested with the bulk upload feature of CLDR we can prepare and make up our minds so that we?re ready for the short windows of opportunity we may then use to efficiently discuss and share LDML data for everybody to upload his or her votes. Regards, Marcel From cldr-users at unicode.org Sun Sep 23 14:46:23 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Sun, 23 Sep 2018 21:46:23 +0200 (CEST) Subject: Locale bringup and barriers for entry Message-ID: <1200687843.5624.1537731983164.JavaMail.www@wwinf2227> On 22/09/18 02:37 Steven R. Loomis via CLDR-Users wrote: [?] > Ideas: > - a web app to take in new locale data? Do you refer to a piece of entirely new software? Another Idea I just got is to code a standalone program in C(++) to edit LDML files by displaying editable charts. And I adhere to Philippe?s advice to set up a collaborative platform open non-stop. That will allow volunteers to be active at their own rhythm without being bound to CLDR-internal timing, while TC may show up in a scheduled way and grab data at fixed deadlines. There may still be programmed rushes for organizations to appoint workforce while getting around the cost of full-time reviewers. Hope that helps :) Regards, Marcel From cldr-users at unicode.org Sun Sep 23 19:38:54 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Mon, 24 Sep 2018 02:38:54 +0200 (CEST) Subject: Locale bringup and barriers for entry Message-ID: <1889299939.6180.1537749534241.JavaMail.www@wwinf2227> On 22/09/18 02:37 Steven R. Loomis via CLDR-Users wrote: [?] > Ideas: > - a web app to take in new locale data? What one might wish to do is coding up an app for Android prompting volunteers to input various content targeting those patterns that are collected in CLDR, while programmatically converting raw data into the data structures that are needed for CLDR. Eg following example in the presentation you have shared, the app could prompt to type in the months of the year, a set of full dates, abbreviations, and so on. Sending it all up to the server, where the data is then processed for CLDR intake. Ie the abstraction process effort is centralized instead of being passed on to the volunteers, sparing them with working through the documentation. > - a web app to debug/explore plurals? That seems to be an example of what algorithms can do in that sense. But beware of stepping over into the realm of full-fleshed automated translation, which is outside the scope of CLDR. Regards, Marcel From cldr-users at unicode.org Mon Sep 24 07:00:26 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Mon, 24 Sep 2018 14:00:26 +0200 (CEST) Subject: Locale bringup and barriers for entry Message-ID: <2053930414.4704.1537790426380.JavaMail.www@wwinf2227> Now sadly this discussion seems to be down without having brought up a large diversity of points of view, suggestions, desiderata and advice. So I need to conclude that despite the appearances, we didn?t aim at using just another occasion to vent about our contributor experience with CLDR. What we intended through our posting is: 1. Inform new contributors so they won?t be surprised and may wish to develop strategies ?? beforehand to eventually mitigate adverse effects of the state of the art, though we?re ?? expecting that all or part of the cited problems will be fixed prior to next survey round. 2. Motivate responsive people to fix all problems in urgency so that no contributors get ?? discouraged when encountering any of those problems. 3. Contribute to the requested brainstorm. We hope we?ve done our part to reach these goals, and we?d welcome any other effort to enrich the feedback harvest. Thanks, Marcel From cldr-users at unicode.org Mon Sep 24 11:52:25 2018 From: cldr-users at unicode.org (Steven R. Loomis via CLDR-Users) Date: Mon, 24 Sep 2018 09:52:25 -0700 Subject: Locale bringup and barriers for entry In-Reply-To: <695051747.1133.1537604109774.JavaMail.www@wwinf2227> References: <695051747.1133.1537604109774.JavaMail.www@wwinf2227> Message-ID: Marcel and Philippe, I see some interesting discussion, though some of it was (as noted in later emails) recapping existing bugs. However, please note how I began this discussion: On Sat, Sep 22, 2018 at 1:15 AM Marcel Schneider via CLDR-Users < cldr-users at unicode.org> wrote: > > At the IUC conference last week, a few of us discussed around lunch some > issues around getting new locales into CLDR, and barriers to entry. > The key word here is ?new?- locales not currently in CLDR. For example, Emoji category names are not a part of CLDR minimal data, and also, new locales will not face issues around inheritance. 1. The English template may not be internally consistent, eg emoji category > names may be singular or plural (plural throughout seems correct); > 2. The English template may not be up-to-date, eg. still including ASCII > quotes in exemplar punctuation though these have been ruled out; > 3. The target data sets may not be comprehensively specified, eg the > define of exemplar punctuation does include an exclusion clause for math > symbols only, while the clause about not including symbols on a > programmatic usage basis such as # @ _ is still missing; > 4. The English template may not be kept in synchrony with the > specifications, eg emoji keywords not to include emoji name or name starter; > There are continuous improvements on the English side data. I don't think the above are necessarily barriers to initial entry. > 5. Numerous bugs affecting markup of inherited values (but these have been > reported and are about to be fixed in the SurveyTool code). > Right. > > http://cldr.unicode.org/index/cldr-spec/minimaldata - especially > pluralization data > > The scope of pluralization seems unclear and biased by the English > paradigm of genderlessness, while in other languages grammatical gender > is a determining parameter for pluralization, so that even extensions to > the DTD seem to be required for providing out-of-the-box pluralization > rules. > I'm not sure what is meant by 'extensions to the DTD'. In any event, CLDR pluralization has proven to be largely successful in practice. Do you have any specific concern about CLDR plurals? Is there a bug filed? > > - what about fonts? > > > keyboards? > > I see fonts and keyboards actually as the two missing components of the > stack that you designed, because though being part of locale data, input > methods are a precondition of efficient submission of locale data. The > full stack would thus expand to: > > > - what are the best ways to coordinate efforts between the language > users and different technical experts? > > Ideas: > > - a web app to take in new locale data? > > I think CLDR has already its web app, ie SurveyTool. A full-time engineer > is actually redeveloping and debugging several or all parts of it. > Again, the scope of this data is data for a completely new locale that is not currently in CLDR. The idea would be an application just for taking in data listed at http://cldr.unicode.org/index/cldr-spec/minimaldata > > - a web app to debug/explore plurals? > > Before including this functionality in SurveyTool, where it belongs in, I > think that the spec should be redesigned, and the documentation updated > accordingly. That could eventually result in extended language support by > CLDR/ICU, which would do no harm but only raise the product value. > Redesigned how? Again - do you have any specific concern about CLDR plurals? Is there a bug filed? > > - allowing some locales to 'get started' without plural rules? > > I think that any locale may get started in CLDR when providing date and > time formats, while correctly displaying a reminder of a shopping cart > may be left over for a later stage. > That's the general idea. (And a good way to put it, as a 'shopping cart'.) Perhaps any data item that depends on plurals ( currency category, compact decimal category, etc. ) would be 'locked' until it is unlocked by the input of plural data. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Mon Sep 24 14:51:35 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Mon, 24 Sep 2018 21:51:35 +0200 (CEST) Subject: Locale bringup and barriers for entry Message-ID: <474792420.10994.1537818695522.JavaMail.www@wwinf2227> On 24/09/18 18:52 Steven R. Loomis wrote: [?] > I'm not sure what is meant by 'extensions to the DTD'.? In any event, CLDR pluralization has proven to be largely successful in practice. > Do you have any specific concern about CLDR plurals? Is there a bug filed? ? I?d filed this bug about French plurals: https://unicode.org/cldr/trac/ticket/11302 Ordinal minimal pairs for French ? Although as noted there, most other locales are unaffected.? I?ve just extrapolated from this that some issues may be awaiting new locales, and that when facing barriers,? getting them out of the way may require the DTD to be extended, so submitters should be ready to file tickets,? as we?re often prompted to do by the SurveyTool information panel. ? [?] > >? > > Before including this functionality in SurveyTool, where it belongs in, I think that the spec should be redesigned, and the documentation updated? > > accordingly. That could eventually result in extended language support by CLDR/ICU, which would do no harm but only raise the product value. > > Redesigned how? Again - do you have any specific concern about CLDR plurals? Is there a bug filed? ? My concern is that CLDR seems not to take gender into account when providing plural rules, but I was told that gender is not inside the scope. The fact is that nouns may inflect differently depending on whether they are feminine or masculine. ? > > >? - allowing some locales to 'get started' without plural rules? > >? > > I think that any locale may get started in CLDR when providing date and time formats, while correctly displaying a reminder of a shopping cart? > > may be left over for a later stage. > > That's the general idea. (And a good way to put it, as a 'shopping cart'.)? ? The idea isn?t mine. Here is the documentation locus where I got it from: ? http://cldr.unicode.org/index/cldr-spec/plural-rules#TOC-Non-inflecting-Nouns-Pronouns ? > Perhaps any data item that depends on plurals ( currency category, compact decimal category, etc. ) > would be 'locked' until it is unlocked by the input of plural data. ? Provided that ?locking? an item won?t cause a blank or another sort of bug.? When a user sees an item not pluralized where it is expected to be plural,? then simply inferring that pluralization isn?t ready might be straightforward. There will surely be some IF in the code to prevent the app from crashing. ? Glad that the discussion has restarted. Perhaps I was too impatient. ? Regards, ? ? Marcel ? From cldr-users at unicode.org Mon Sep 24 15:18:26 2018 From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users) Date: Mon, 24 Sep 2018 13:18:26 -0700 Subject: Locale bringup and barriers for entry In-Reply-To: <474792420.10994.1537818695522.JavaMail.www@wwinf2227> References: <474792420.10994.1537818695522.JavaMail.www@wwinf2227> Message-ID: Mark On Mon, Sep 24, 2018 at 12:52 PM Marcel Schneider via CLDR-Users < cldr-users at unicode.org> wrote: > On 24/09/18 18:52 Steven R. Loomis wrote: > [?] > > I'm not sure what is meant by 'extensions to the DTD'. In any event, > CLDR pluralization has proven to be largely successful in practice. > > Do you have any specific concern about CLDR plurals? Is there a bug > filed? > > I?d filed this bug about French plurals: > > https://unicode.org/cldr/trac/ticket/11302 > Ordinal minimal pairs for French > > Although as noted there, most other locales are unaffected. > I?ve just extrapolated from this that some issues may be awaiting new > locales, and that when facing barriers, > getting them out of the way may require the DTD to be extended, so > submitters should be ready to file tickets, > as we?re often prompted to do by the SurveyTool information panel. > > [?] > > > > > > Before including this functionality in SurveyTool, where it belongs > in, I think that the spec should be redesigned, and the documentation > updated > > > accordingly. That could eventually result in extended language support > by CLDR/ICU, which would do no harm but only raise the product value. > > > > Redesigned how? Again - do you have any specific concern about CLDR > plurals? Is there a bug filed? > > My concern is that CLDR seems not to take gender into account when > providing plural rules, but I was told that gender is not inside the scope. > The fact is that nouns may inflect differently depending on whether they > are feminine or masculine. > The focus for plurals in CLDR is "what would change if I change a number to another number in a placeholder". So if I have a message with a masculine noun, I have two versions: one: "{number} libro ? selezionato" other: "{number} libri sono selezionati" vs also 2 versions with a feminine noun. one: "{number} nota ? selezionata" other: "{number} note ? selezionato Now, there are some languages (eg Russian) that only exhibit differences for one of the plural categories if there is certain gender involved. So the plural categories themselves need to be the maximal partition across the possible genders, cases, and other features. What is NOT in scope for CLDR at this time is to both change gender and number. Typically that requires many other changes in the rest of the text. one: "{number} {thing} ? selezionata" ... ICU has a mechanism for doing a SELECT using gender, but there the gender has to be supplied as a parameter, and a sub-message supplied for each of the (say) 3 genders x 4 plural-categories. Actually detecting the gender of nouns and modifying sentences on that basis is out of scope (and a very tricky problem in general). > > > > - allowing some locales to 'get started' without plural rules? > > > > > > I think that any locale may get started in CLDR when providing date > and time formats, while correctly displaying a reminder of a shopping cart > > > may be left over for a later stage. > > > > That's the general idea. (And a good way to put it, as a 'shopping > cart'.) > > The idea isn?t mine. Here is the documentation locus where I got it from: > > > http://cldr.unicode.org/index/cldr-spec/plural-rules#TOC-Non-inflecting-Nouns-Pronouns > > > Perhaps any data item that depends on plurals ( currency category, > compact decimal category, etc. ) > > would be 'locked' until it is unlocked by the input of plural data. > > Provided that ?locking? an item won?t cause a blank or another sort of > bug. > When a user sees an item not pluralized where it is expected to be plural, > then simply inferring that pluralization isn?t ready might be > straightforward. > There will surely be some IF in the code to prevent the app from crashing. > What we have considered (there is a ticket for this somewhere) is disallowing any data/votes to be entered in a row with a "count" or "ordinal" attribute until the rules (resp. plural or ordinal) are supplied. The row would either be grayed out or just omitted. So data could be entered in the locale for other fields, but the locale couldn't reach moderate or modern coverage without the rules. So applications not requiring that coverage level could include the locale, but those requiring that coverage level would omit it. > > Glad that the discussion has restarted. Perhaps I was too impatient. > > Regards, > > > Marcel > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Tue Sep 25 00:54:25 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Tue, 25 Sep 2018 07:54:25 +0200 (CEST) Subject: Locale bringup and barriers for entry In-Reply-To: References: <474792420.10994.1537818695522.JavaMail.www@wwinf2227> Message-ID: <329976032.379.1537854865588.JavaMail.www@wwinf2227> On 24/09/18 22:18 Mark Davis ?? wrote: [quote] > > The focus for plurals in CLDR is "what would change if I change a number to another number in a placeholder". > So if I have a message with a masculine noun, I have two versions: > > one: "{number} libro ? selezionato" > other: "{number} libri sono selezionati" > > vs also 2 versions with a feminine noun. > > one: "{number} nota ? selezionata" > other: "{number} note ? selezionato I?m turning out unable to retrieve plural rules in the LDML tree, except some plural and ordinal minimal pairs. Also the actual DTD does not seem to contain what is found in the LDML spec at: https://www.unicode.org/reports/tr35/tr35-numbers.html#Language_Plural_Rules The DTD only has: That tends at conjecturing that plural support is still partly under construction, hence probably the stress put upon it in Steven?s posting. Consistently, at locale level, eg for Italian, common/main/it.xml only has: {0} giorno {0} giorni Prendi l?{0}? a destra. Prendi la {0}? a destra. That is what I meant when complaining about gender support. Following your exemplar data, we should have additional data, and I can see no structure to accomodate additional forms: {0} libro ? selezionato {0} libri sono selezionati {0} nota ? selezionata {0} note sono selezionate The apparent redundancy induced might be disambiguated by adding a gender attribute: {0} libro ? selezionato {0} libri sono selezionati {0} nota ? selezionata {0} note sono selezionate The case is also striking when considering ordinal minimal pairs. To start, I can find no clear definition of what "few" and "many" are to represent. Hence I?m unable to make sense of the following, although that may result from my incompetence in Italian, and not using Google Translate right now to enlighten me (although I heavily used it elsewhere): Prendi l?{0}? a destra. When making a case for gender here, taking something like "via" for feminine, and "camino" for masculine, and "prima"/"primo" for "one" vs "terzia"/"terzio" for "other", the data above would IMO expand to: Prendi la {0}? a destra. ??? Prendi la {0}? a destra. Prendi il {0}? a destra. ??? Prendi il {0}? a destra. Assuming that "many" stands for "8" ? which should be defined somewhere ? and collapsing redundant defines, the result would be akin to the original data (although with proper ordinal indicators): Prendi l?{0}? a destra. Prendi la {0}? a destra. Prendi l?{0}? a destra. Prendi il {0}? a destra. Perhaps ticket #11393 is related to this topic. > Now, there are some languages (eg Russian) that only exhibit differences > for one of the plural categories if there is certain gender involved. > So the plural categories themselves need to be the maximal partition > across the possible genders, cases, and other features. Perhaps I?m silly, still I?m unable to figure out how "minimal pairs" can represent "maximal partition". > What is NOT in scope for CLDR at this time is to both change gender and number. > Typically that requires many other changes in the rest of the text. What I mean is not that CLDR should show the way of transforming content across gender. What I mean is that CLDR should provide support for both feminine/masculine and masculine/feminine patterns. Actually gender support seems to be limited to what English examples suggest as a translation, be it masculine when "day" translates to "giorno", or feminine when "street" translates to "via". That is what I think is insufficient. > one: "{number} {thing} ? selezionata" > ... > ? > ICU has a mechanism for doing a SELECT using gender, but there the gender has to be supplied > as a parameter, and a sub-message supplied for each of the (say) 3 genders x 4 plural-categories. > > Actually detecting the gender of nouns and modifying sentences on that basis is out of scope > (and a very tricky problem in general). That seems OK to me as long as CLDR actually helps developers with data for any case they may encounter when setting up the values. Else they may wish to just look up a dictionary and a grammar of the target locale to find out by themselves what are the cases they have to consider. [quote] > > > Perhaps any data item that depends on plurals ( currency category, compact decimal category, etc. ) > > > would be 'locked' until it is unlocked by the input of plural data. > > ? > > Provided that ?locking? an item won?t cause a blank or another sort of bug.? > > When a user sees an item not pluralized where it is expected to be plural,? > > then simply inferring that pluralization isn?t ready might be straightforward. > > There will surely be some IF in the code to prevent the app from crashing. > > What we have considered (there is a ticket for this somewhere) is disallowing any data/votes > to be entered in a row with a "count" or "ordinal" attribute until the rules (resp. plural or ordinal) > are supplied. The row would either be grayed out or just omitted. > So data could be entered in the locale for other fields, but the locale couldn't reach moderate > or modern coverage without the rules. So applications not requiring that coverage level could > include the locale, but those requiring that coverage level would omit it. Sorry, I misunderstood the scope. Thanks for explaining. Perhaps the ticket may be #11061 Indeed that makes for clean data and ensures reliability of CLDR. If so many plural rule data are missing that CLDR must make a special case for it, that may result from the difficulties that non-expert vetters like me are experiencing with the topic. Now that CLDR plural rules are reported to work well in practice, I?m wondering about how all that interconnects. Eg obviously some rules are working well, especially when matching some frequent uses cases. But the point as I can see it is whether CLDR is covering *all* use cases, eventually except very rare ones. Thanks. Regards, Marcel From cldr-users at unicode.org Tue Sep 25 03:00:40 2018 From: cldr-users at unicode.org (Philippe Verdy via CLDR-Users) Date: Tue, 25 Sep 2018 10:00:40 +0200 Subject: Locale bringup and barriers for entry In-Reply-To: <329976032.379.1537854865588.JavaMail.www@wwinf2227> References: <474792420.10994.1537818695522.JavaMail.www@wwinf2227> <329976032.379.1537854865588.JavaMail.www@wwinf2227> Message-ID: The numeric cases tagged as "one", "few", "many", "other" are defined in CLDR in plural rules for each locale. When a message is not translated in a given language and another message is taken from a fallback, the plural rules defined for that fallback must then be used instead of the plural rules for the initial target locale. Plural rules are documented. These are defined as minimal data needed to start any new locale. and note that the "other" rule is used as a fallback if a locale does not define any message for a specific plural form, so before looking of for fallback languages, the messages are first looking for a translation in the "other" plural rule in the target locale. Once a new locale is being setup, the CLDR survey will ask for translations for each plural form where needed (when a message to translate has a placeholder for a variable number), but note that a given message cannot be tagged like this if it contains several placeholders with different numeric values: if this happens, it will have to be splitted in several parts and the parts will be assembled in another message containing pleholders for each part (this would also be needed if there were multiple genders or grammatic cases to handle in the same assembled message). Le mar. 25 sept. 2018 ? 07:58, Marcel Schneider via CLDR-Users < cldr-users at unicode.org> a ?crit : > To start, I can find no clear definition of what "few" and "many" are to > represent. > Hence I?m unable to make sense of the following, although that may result > from my incompetence in > Italian, and not using Google Translate right now to enlighten me > (although I heavily used it elsewhere): > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Tue Sep 25 04:32:30 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Tue, 25 Sep 2018 11:32:30 +0200 (CEST) Subject: Locale bringup and barriers for entry In-Reply-To: References: <474792420.10994.1537818695522.JavaMail.www@wwinf2227> <329976032.379.1537854865588.JavaMail.www@wwinf2227> Message-ID: <312813575.3078.1537867951491.JavaMail.www@wwinf2227> On 25/09/18 10:00 Philippe Verdy wrote: > > The numeric cases tagged as "one", "few", "many", "other" are defined in CLDR in plural rules for each locale. Italian happens to use it while it isn?t defined in main/it.xml. On the other hand, main/en.xml doesn?t define it neither, but doesn?t use it, although English could use a case for "eight" as documented in: https://www.unicode.org/reports/tr35/tr35-numbers.html#Language_Plural_Rules But it is considered an edge case: ??There is an edge case in English because of the behavior of "a/an". For example, in changing from 3 to 8: "a 3rd of a loaf" should result in "an 8th of a loaf", not "a 8th of a loaf" "a 3 foot stick" should result in "an 8 foot stick", not "a 8 foot stick" So numbers of the following forms could have a special plural category and special ordinal category: 8(X), 11(X), 18(X), 8x(X), where x is 0..9 and the optional X is 00, 000, 00000, and so on. On the other hand, the above constructions are relatively rare in messages constructed using numeric placeholders, so the disruption for implementations currently using CLDR plural categories wouldn't be worth the small gain.?? I don?t agree with the conclusion, given displaying messages like ?Do you wish a 8 foot stick?? would reflect badly on the corporate image of the retailer using a poorly implemented user interface. > When a message is not translated in a given language and another message is taken from a fallback, > the plural rules defined for that fallback must then be used instead of the plural rules for the initial target locale. Agreed, but having untranslated values in a locale is not making that locale particularly well supported in CLDR. > Plural rules are documented. These are defined as minimal data needed to start any new locale. That seems to be one of those barriers that Steven is now questioning, or even the main barrier for entry. For me that would remain a barrier as long as I cannot get clear insight nor see straightforward structures to fill in. > and note that the "other" rule is used as a fallback if a locale does not define any message for a specific plural form, > so before looking of for fallback languages, the messages are first looking for a translation in the "other" plural rule in the target locale. In those cases, implementations may use generic display such as ?Your cart ({0})? where {0} is the number of items it contains, much like in a mailbox the number of new messages in a folder. > Once a new locale is being setup, the CLDR survey will ask for translations for each plural form where needed > (when a message to translate has a placeholder for a variable number), but note that a given message cannot > be tagged like this if it contains several placeholders with different numeric values: if this happens, it will have > to be splitted in several parts and the parts will be assembled in another message containing pleholders for each part > (this would also be needed if there were multiple genders or grammatic cases to handle in the same assembled message). Got it, thanks. That doesn?t resolve however what I meant when complaining that CLDR does not provide comprehensive support for inflected forms. IMO it would be more useful to note that Italian nouns ending in -o must have that -o changed to -i when pluralized, and those ending in -a must have the -a replaced with -e. But that only encompasses regular inflection. I end up thinking that there is no point for CLDR in providing inflected forms. Wouldn?t it suffice to indicate which numbers require plural and which category? For support of abbreviated ordinals, CLDR could simply list all ways of constructing an ordinal abbreviation, and relate them to number and to gender. It isn?t clear to me how a GPS message could make it into CLDR. I think that one should stick with the way things are done for date and time. Regards, Marcel From cldr-users at unicode.org Tue Sep 25 06:02:55 2018 From: cldr-users at unicode.org (Philippe Verdy via CLDR-Users) Date: Tue, 25 Sep 2018 13:02:55 +0200 Subject: Locale bringup and barriers for entry In-Reply-To: <312813575.3078.1537867951491.JavaMail.www@wwinf2227> References: <474792420.10994.1537818695522.JavaMail.www@wwinf2227> <329976032.379.1537854865588.JavaMail.www@wwinf2227> <312813575.3078.1537867951491.JavaMail.www@wwinf2227> Message-ID: Le mar. 25 sept. 2018 ? 11:32, Marcel Schneider a ?crit : > On 25/09/18 10:00 Philippe Verdy wrote: > > Plural rules are documented. These are defined as minimal data needed to > start any new locale. > > That seems to be one of those barriers that Steven is now questioning, or > even the main barrier for entry. > For me that would remain a barrier as long as I cannot get clear insight > nor see straightforward structures to fill in. > > See the documentation: http://cldr.unicode.org/index/cldr-spec/plural-rules And the supplemental data which gives a list per locale: http://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Tue Sep 25 06:20:55 2018 From: cldr-users at unicode.org (Philippe Verdy via CLDR-Users) Date: Tue, 25 Sep 2018 13:20:55 +0200 Subject: Locale bringup and barriers for entry In-Reply-To: References: <474792420.10994.1537818695522.JavaMail.www@wwinf2227> <329976032.379.1537854865588.JavaMail.www@wwinf2227> <312813575.3078.1537867951491.JavaMail.www@wwinf2227> Message-ID: Note that the supplemental data is OK for the "cardinal" and "range" type of categories, but largely failing almost everywhere for the "ordinal" type. E.g. in French: "Prenez la 1re ? droite" (this assumes the feminine gender, which is ok for "rue"="street", "avenue", or "sortie"="exit", but wrong for "feu"="trafic light" or "stop" which are masculine, as in "Tournez au 1er feu ? droite", where "1er" and "1re" change depending on the gender of the explicit or implicit noun) Yes ordinals (but also fractions) need derivation by gender (as well as grammatical case) including for abbreviated forms (e.g. in French, Italian, Spanish, but even in English with inflected leading articles like "a" vs. "an", which depends on the numeric value of the ordinal). And I see little use of these "ordinal" types except in strict isolation assuming a nominal use (outside of real sentences where they will be inserted) without any relation with the noun (or nominal group) to which they refer (note: this noun or nominal group may be outside the curent isolated "paragraph", such as a column heading, or other info such as resulting ranks in sportive competition for women, vs. the same table for men. Basically this means that CLDR just provides baic data that still needs to be tuned and localized again for specific applications, even if this tuning is generic. What CLDR can do however is to monitor if there are stable applications desiring to interchange their localized data containign gender or case differences: if their localisation data is large enough to cover enough locales for a significant part of the world and theyr want to interoperate, they will create a defacto standard that can be integrated (after being proposed to CLDR with enough examplar data and open licencing). Such applications already exist (notably across wikis, ven if this still requires much work to have them cooperate together to stabilize some issues and agree to some common formats, and efficicently track the translations problems remaining and how to manage the remaining incoherences, as well as accepting some deviations for specific uses in more specific pages they don't want to break). Le mar. 25 sept. 2018 ? 13:02, Philippe Verdy a ?crit : > > > Le mar. 25 sept. 2018 ? 11:32, Marcel Schneider a > ?crit : > >> On 25/09/18 10:00 Philippe Verdy wrote: >> > Plural rules are documented. These are defined as minimal data needed >> to start any new locale. >> >> That seems to be one of those barriers that Steven is now questioning, or >> even the main barrier for entry. >> For me that would remain a barrier as long as I cannot get clear insight >> nor see straightforward structures to fill in. >> >> See the documentation: > http://cldr.unicode.org/index/cldr-spec/plural-rules > > And the supplemental data which gives a list per locale: > > http://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Tue Sep 25 14:11:49 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Tue, 25 Sep 2018 21:11:49 +0200 (CEST) Subject: Locale bringup and barriers for entry In-Reply-To: References: <474792420.10994.1537818695522.JavaMail.www@wwinf2227> <329976032.379.1537854865588.JavaMail.www@wwinf2227> <312813575.3078.1537867951491.JavaMail.www@wwinf2227> Message-ID: <1873582906.6218.1537902709278.JavaMail.www@wwinf2227> Thanks for the links to documentation. The first page: http://cldr.unicode.org/index/cldr-spec/plural-rules contains new instructions stating that gender is irrelevant except if two nouns of different gender are needed to cover all plural categories. This results in replacing ?Prenez la {0}re ? droite; Prenez le {0}er ? droite? with a sentence like you suggested: ?Prenez au {0}er feu ? droite puis la {0}re ? droite? Still I don?t understand why information is to be packed into arbitrary phrases instead of being stored in a more formal way, using appropriate data structures differentiating the values by transparent criteria, like what is already done for number with data stored in the supplemental/ directory: https://www.unicode.org/repos/cldr/tags/latest/common/supplemental/plurals.xml https://www.unicode.org/repos/cldr/tags/latest/common/supplemental/ordinals.xml which is what I looked for. Values like "few" and "many" seem to be used as convenient labels to get more categories. Eg Gujarati has "two" for 2 and 3, "few" for 4, and "many" for 6, while 5 and 7 upwards are "other". Understandably "many" is used for Italian to label the category dedicated to numbers starting with a vowel. The supplemental/ folder contains many things, among which I stumbled over attributeValueValidity.xml. The 2?? through 4?? comment in this file are contradicting the very subject of this thread, so I suggest to remove these PRIOR to the v34 release? Regards, Marcel On 25/09/18 13:21 Philippe Verdy wrote: > Note that the supplemental data is OK for the "cardinal" and "range" type of categories, but largely failing almost everywhere for the "ordinal" type. E.g. in French: "Prenez la 1re ? droite" (this assumes the feminine gender, which is ok for "rue"="street", "avenue", or "sortie"="exit", but wrong for "feu"="trafic light" or "stop" which are masculine, as in "Tournez au 1er feu ? droite", where "1er" and "1re" change depending on the gender of the explicit or implicit noun) > Yes ordinals (but also fractions) need derivation by gender (as well as grammatical case) including for abbreviated forms (e.g. in French, Italian, Spanish, but even in English with inflected leading articles like "a" vs. "an", which depends on the numeric value of the ordinal). > And I see little use of these "ordinal" types except in strict isolation assuming a nominal use (outside of real sentences where they will be inserted) without any relation with the noun (or nominal group) to which they refer (note: this noun or nominal group may be outside the curent isolated "paragraph", such as a column heading, or other info such as resulting ranks in sportive competition for women, vs. the same table for men. > Basically this means that CLDR just provides baic data that still needs to be tuned and localized again for specific applications, even if this tuning is generic. What CLDR can do however is to monitor if there are stable applications desiring to interchange their localized data containign gender or case differences: if their localisation data is large enough to cover enough locales for a significant part of the world and theyr want to interoperate, they will create a defacto standard that can be integrated (after being proposed to CLDR with enough examplar data and open licencing). > Such applications already exist (notably across wikis, ven if this still requires much work to have them cooperate together to stabilize some issues and agree to some common formats, and efficicently track the translations problems remaining and how to manage the remaining incoherences, as well as accepting some deviations for specific uses in more specific pages they don't want to break). > Le?mar. 25 sept. 2018 ??13:02, Philippe Verdy a ?crit?: > > > Le?mar. 25 sept. 2018 ??11:32, Marcel Schneider a ?crit?: > On 25/09/18 10:00 Philippe Verdy wrote: > > Plural rules are documented. These are defined as minimal data needed to start any new locale. > > That seems to be one of those barriers that Steven is now questioning, or even the main barrier for entry. > For me that would remain a barrier as long as I cannot get clear insight nor see straightforward structures to fill in. > > See the documentation: http://cldr.unicode.org/index/cldr-spec/plural-rules > And the supplemental data which gives a list per locale: http://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html > > From cldr-users at unicode.org Tue Sep 25 16:27:10 2018 From: cldr-users at unicode.org (Luke Dashjr via CLDR-Users) Date: Tue, 25 Sep 2018 21:27:10 +0000 Subject: Locale bringup and barriers for entry In-Reply-To: References: Message-ID: <201809252127.11342.luke@dashjr.org> It's been a while since I tried, but I didn't see any possible way to define a locale's number system (eg, octal or tonal instead of decimal). On Saturday 22 September 2018 00:34:27 Steven R. Loomis via CLDR-Users wrote: > Hello, and welcome to the new cldr-users members. > > For discussion: > > At the IUC conference last week, a few of us discussed around lunch some > issues around getting new locales into CLDR, and barriers to entry. > > Barriers: > - we discussed that it could be confusing or difficult to collect all of > the data needed for a minimal locale: > http://cldr.unicode.org/index/cldr-spec/minimaldata - especially > pluralization data > - what about fonts? keyboards? > - what are the best ways to coordinate efforts between the language users > and different technical experts? > > Ideas: > - a web app to take in new locale data? > - a web app to debug/explore plurals? > - allowing some locales to 'get started' without plural rules? > > Links for discussion: > - Elnaz and Steven's prez from (last) Monday: https://goo.gl/sN7biw > - My "full stack" blog post: > https://srl295.github.io/2017/06/06/full-stack-enablement/ From cldr-users at unicode.org Tue Sep 25 16:48:33 2018 From: cldr-users at unicode.org (Steven R. Loomis via CLDR-Users) Date: Tue, 25 Sep 2018 14:48:33 -0700 Subject: Locale bringup and barriers for entry In-Reply-To: <201809252127.11342.luke@dashjr.org> References: <201809252127.11342.luke@dashjr.org> Message-ID: The numbering system is defined in TR 35 in https://unicode.org/reports/tr35/tr35-numbers.html#Numbering_Systems in terms of either '*numeric*' (which are decimal systems, just substituting different digits for "0123456789", such as ???????????? for the Vai language, or else *algorithmic* which are more complex rule based. I suppose octal and tonal (hexadecimal?!) could be supported by the algorithmic approach. On Tue, Sep 25, 2018 at 2:27 PM Luke Dashjr wrote: > It's been a while since I tried, but I didn't see any possible way to > define a > locale's number system (eg, octal or tonal instead of decimal). > > On Saturday 22 September 2018 00:34:27 Steven R. Loomis via CLDR-Users > wrote: > > Hello, and welcome to the new cldr-users members. > > > > For discussion: > > > > At the IUC conference last week, a few of us discussed around lunch some > > issues around getting new locales into CLDR, and barriers to entry. > > > > Barriers: > > - we discussed that it could be confusing or difficult to collect all of > > the data needed for a minimal locale: > > http://cldr.unicode.org/index/cldr-spec/minimaldata - especially > > pluralization data > > - what about fonts? keyboards? > > - what are the best ways to coordinate efforts between the language users > > and different technical experts? > > > > Ideas: > > - a web app to take in new locale data? > > - a web app to debug/explore plurals? > > - allowing some locales to 'get started' without plural rules? > > > > Links for discussion: > > - Elnaz and Steven's prez from (last) Monday: https://goo.gl/sN7biw > > - My "full stack" blog post: > > https://srl295.github.io/2017/06/06/full-stack-enablement/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Tue Sep 25 17:38:25 2018 From: cldr-users at unicode.org (Philippe Verdy via CLDR-Users) Date: Wed, 26 Sep 2018 00:38:25 +0200 Subject: Locale bringup and barriers for entry In-Reply-To: References: <201809252127.11342.luke@dashjr.org> Message-ID: octal and hexadecimal (as well as binary) are obviously numeric system using the same digits (or borrowing additional letters or adding other supplemental digits): the algorithm behind is the same as decimal, it's just using a different base (not necessarily wrriten each time but infered from the context), and that algorithm is equally simple, it's basic arithmetic expressed over a cyclic group. That numeric notation is contradicted by the way nbumbers are actually spelled in actual languages, where the base is obviously not just decimal but is using larger bases (most often 1000 in European traditions, but 100 or 10000 in parts of Asia, with various exeptions using remainining traces of base 20). Historically, numbers had mystic or religious traditions, and there remains some old systems using base 12 (including the old English and Celtic traditions). Octal and heaxdecimal are certainly modern inventions for technical reasons (or limitations for and older state-of-the-art technology and costs of implementations when pure binary system was simply unusable for most usages; usage of octal is now deprecated, largely replaced by hexadecimal... except in wellknown programming languages and in old technical documentations for the oldest computing standards that were never really deprecated completely to become really out of use or because of compatibility issues: its support is still mandatory as its also impacts how these programming languages are parsed into unbreakable lexical tokens: it would be unpractical to change this basic tokenisation algorithm on which the rest of the language is built, but a contrario, this is also limiting the practical adoption of hexadecimal which requires more complex syntax even if it should be more compact). Still today, the decimal system is the most widely used, but may be in solme future, hexadecimal will become popular and translated in actual languages to express numbers. Then it will be time to have actual characters added with distinctive forms for the 6 additional digits, instead of borrowing Latin letters. This could come first from other languages than those currently using Latin (I think it may appear first in China, Japan or Korea, as part of the sinographic system or as extensions of kanas and hangul, and rapidely adopted in South Asia, and once again European scripts will be the last to accept the change, just as they were very late in adopting the concept of zero, negative numbers and fractional decimals using digits, and separators for grouping/decimals). Yes, I don't see why there's still no hexadecimal extension digits added, even if today most hexadecimal numbers are used only in technical programming languages that are standardized only using basic Latin/ASCII. The barrier is still the adoption also in humane languages for general use, as well as various legal restrictions (notably for pricing/billing/accounting/contracting/taxing). There's is less restrictions in the old legal/judiciary traditions where other systems were largely in use (and are still !) Le mar. 25 sept. 2018 ? 23:55, Steven R. Loomis via CLDR-Users < cldr-users at unicode.org> a ?crit : > The numbering system is defined in TR 35 in > https://unicode.org/reports/tr35/tr35-numbers.html#Numbering_Systems in > terms of either '*numeric*' (which are decimal systems, just substituting > different digits for "0123456789", such as ???????????? for the Vai > language, or else *algorithmic* which are more complex rule based. I > suppose octal and tonal (hexadecimal?!) could be supported by the > algorithmic approach. > > > > > > On Tue, Sep 25, 2018 at 2:27 PM Luke Dashjr wrote: > >> It's been a while since I tried, but I didn't see any possible way to >> define a >> locale's number system (eg, octal or tonal instead of decimal). >> >> On Saturday 22 September 2018 00:34:27 Steven R. Loomis via CLDR-Users >> wrote: >> > Hello, and welcome to the new cldr-users members. >> > >> > For discussion: >> > >> > At the IUC conference last week, a few of us discussed around lunch some >> > issues around getting new locales into CLDR, and barriers to entry. >> > >> > Barriers: >> > - we discussed that it could be confusing or difficult to collect all of >> > the data needed for a minimal locale: >> > http://cldr.unicode.org/index/cldr-spec/minimaldata - especially >> > pluralization data >> > - what about fonts? keyboards? >> > - what are the best ways to coordinate efforts between the language >> users >> > and different technical experts? >> > >> > Ideas: >> > - a web app to take in new locale data? >> > - a web app to debug/explore plurals? >> > - allowing some locales to 'get started' without plural rules? >> > >> > Links for discussion: >> > - Elnaz and Steven's prez from (last) Monday: https://goo.gl/sN7biw >> > - My "full stack" blog post: >> > https://srl295.github.io/2017/06/06/full-stack-enablement/ >> > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Tue Sep 25 21:59:40 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Wed, 26 Sep 2018 04:59:40 +0200 (CEST) Subject: Locale bringup and barriers for entry Message-ID: <1707078068.37.1537930780251.JavaMail.www@wwinf2209> What locales are you referring to? If they are new to CLDR, and you experienced difficulties in setting up their numbering system, then there is yet a supplemental barrier. ? As far as I can see, I only know Sumerian and Babylonian locales using sexagesimal numbering. Octal and hexadecimal/tonal as a locale?s numbering system are discouraged as counterintuitive, as they neither allow people to count on fingers in a straightforward way, nor to efficiently communicate digits using hand gestures. More generally, I don?t believe that it could be useful for a locale to focus on its numbering system in order to get away from widespread usage. Yes we really do need to make changes, but the numbering system does in no way appear to me to seem to be in any way the right end to begin with. Sorry to tell it bluntly, but I?d suggest to focus on getting all existing locales into CLDR, unlike what is suggested in the comments I?d pointed in my previous message, and on fixing existing errors. If any existing living locale does use octal, tonal, sexagesimal, or whatever non-decimal system beside purely notational conventions like Roman, then indeed we need to dig deeper into the matter in order to get them into CLDR. ? Having said that, as Steven pointed out, there are already some locales using algorithmic numbering, as seen in the data: ? https://www.unicode.org/repos/cldr/tags/latest/common/bcp47/number.xml https://www.unicode.org/repos/cldr/tags/latest/common/supplemental/numberingSystems.xml ? For reference, here is the specification, not very explicit about algorithmic: http://www.unicode.org/reports/tr35/#Numbering%20System%20Data ? ? Nevertheless I don?t think that Nystrom was wrong in challenging the elites of his generation, given the current approach proved to be a slope into catastrophe, so that today we need to make changes at 180?, or 8 tims when expressing it in tonal, like those suggested on: http://sunsite.monsite-orange.fr/page-5b9e092880342.html ? Regards, ? Marcel ? On 26/09/18 00:43 Philippe Verdy via CLDR-Users wrote: > octal and hexadecimal (as well as binary) are obviously numeric system using the same digits (or borrowing additional letters or adding other supplemental digits): the algorithm behind is the same as decimal, it's just using a different base (not necessarily wrriten each time but infered from the context), and that algorithm is equally simple, it's basic arithmetic expressed over a cyclic group. That numeric notation is contradicted by the way nbumbers are actually spelled in actual languages, where the base is obviously not just decimal but is using larger bases (most often 1000 in European traditions, but 100 or 10000 in parts of Asia, with various exeptions using remainining traces of base 20). Historically, numbers had mystic or religious traditions, and there remains some old systems using base 12 (including the old English and Celtic traditions). > Octal and heaxdecimal are certainly modern inventions for technical reasons (or limitations for and older state-of-the-art technology and costs of implementations when pure binary system was simply unusable for most usages; usage of octal is now deprecated, largely replaced by hexadecimal... except in wellknown programming languages and in old technical documentations for the oldest computing standards that were never really deprecated completely to become really out of use or because of compatibility issues: its support is still mandatory as its also impacts how these programming languages are parsed into unbreakable lexical tokens: it would be unpractical to change this basic tokenisation algorithm on which the rest of the language is built, but a contrario, this is also limiting the practical adoption of hexadecimal which requires more complex syntax even if it should be more compact). > Still today, the decimal system is the most widely used, but may be in solme future, hexadecimal will become popular and translated in actual languages to express numbers. Then it will be time to have actual characters added with distinctive forms for the 6 additional digits, instead of borrowing Latin letters. This could come first from other languages than those currently using Latin (I think it may appear first in China, Japan or Korea, as part of the sinographic system or as extensions of kanas and hangul, and rapidely adopted in South Asia, and once again European scripts will be the last to accept the change, just as they were very late in adopting the concept of zero, negative numbers and fractional decimals using digits, and separators for grouping/decimals). > Yes, I don't see why there's still no hexadecimal extension digits added, even if today most hexadecimal numbers are used only in technical programming languages that are standardized only using basic Latin/ASCII. The barrier is still the adoption also in humane languages for general use, as well as various legal restrictions (notably for pricing/billing/accounting/contracting/taxing). There's is less restrictions in the old legal/judiciary traditions where other systems were largely in use (and are still !) > > > Le?mar. 25 sept. 2018 ??23:55, Steven R. Loomis via CLDR-Users a ?crit?: > The numbering system is defined in TR 35 in?https://unicode.org/reports/tr35/tr35-numbers.html#Numbering_Systems in terms of either 'numeric' (which are decimal systems, just substituting different digits for "0123456789", such as ??????????????for the Vai language, or else algorithmic?which are more complex rule based. I suppose octal and tonal (hexadecimal?!) could be supported by the algorithmic approach. > > > > > On Tue, Sep 25, 2018 at 2:27 PM Luke Dashjr wrote: > It's been a while since I tried, but I didn't see any possible way to define a > locale's number system (eg, octal or tonal instead of decimal). > > On Saturday 22 September 2018 00:34:27 Steven R. Loomis via CLDR-Users wrote: > > Hello, and welcome to the new cldr-users members. > > > > For discussion: > > > > At the IUC conference last week, a few of us discussed around lunch some > > issues around getting new locales into CLDR, and barriers to entry. > > > > Barriers: > > - we discussed that it could be confusing or difficult to collect all of > > the data needed for a minimal locale: > > http://cldr.unicode.org/index/cldr-spec/minimaldata - especially > > pluralization data > > - what about fonts? keyboards? > > - what are the best ways to coordinate efforts between the language users > > and different technical experts? > > > > Ideas: > > - a web app to take in new locale data? > > - a web app to debug/explore plurals? > > - allowing some locales to 'get started' without plural rules? > > > > Links for discussion: > > - Elnaz and Steven's prez from (last) Monday: https://goo.gl/sN7biw > > - My "full stack" blog post: > > https://srl295.github.io/2017/06/06/full-stack-enablement/ > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > _______________________________________________ CLDR-Users mailing list CLDR-Users at unicode.org http://unicode.org/mailman/listinfo/cldr-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Wed Sep 26 09:38:01 2018 From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users) Date: Wed, 26 Sep 2018 16:38:01 +0200 Subject: Locale bringup and barriers for entry In-Reply-To: References: <474792420.10994.1537818695522.JavaMail.www@wwinf2227> <329976032.379.1537854865588.JavaMail.www@wwinf2227> <312813575.3078.1537867951491.JavaMail.www@wwinf2227> Message-ID: > Note that the supplemental data is OK for the "cardinal" and "range" type of categories, but largely failing almost everywhere for the "ordinal" type. This is due to a misunderstanding of how ordinal works. It is just like cardinal (plural) in that the translator is responsible for the text, *and* accounting for gender. The examples given are thus irrelevant. "Prenez la 1re ? droite" Would be: one: "Prenez la {number}re ? droite" other: "Prenez la {number}e ? droite" or one: "Tournez au {number}er feu ? droite" other: "Tournez au {number}e feu ? droite" To reiterate, the handling of grammatical inflections other than plurals/ordinals is outside the current scope of CLDR, but it is false to say that CLDR "fails" for ordinals. I would recommend that before you say "CLDR fails at X", you first ask so that you can verify that your understanding of CLDR is correct. Mark On Tue, Sep 25, 2018 at 1:21 PM Philippe Verdy wrote: > Note that the supplemental data is OK for the "cardinal" and "range" type > of categories, but largely failing almost everywhere for the "ordinal" type. > E.g. in French: "Prenez la 1re ? droite" (this assumes the feminine > gender, which is ok for "rue"="street", "avenue", or "sortie"="exit", but > wrong for "feu"="trafic light" or "stop" which are masculine, as in > "Tournez au 1er feu ? droite", where "1er" and "1re" change depending on > the gender of the explicit or implicit noun) > > Yes ordinals (but also fractions) need derivation by gender (as well as > grammatical case) including for abbreviated forms (e.g. in French, Italian, > Spanish, but even in English with inflected leading articles like "a" vs. > "an", which depends on the numeric value of the ordinal). > > And I see little use of these "ordinal" types except in strict isolation > assuming a nominal use (outside of real sentences where they will be > inserted) without any relation with the noun (or nominal group) to which > they refer (note: this noun or nominal group may be outside the curent > isolated "paragraph", such as a column heading, or other info such as > resulting ranks in sportive competition for women, vs. the same table for > men. > > Basically this means that CLDR just provides baic data that still needs to > be tuned and localized again for specific applications, even if this tuning > is generic. What CLDR can do however is to monitor if there are stable > applications desiring to interchange their localized data containign gender > or case differences: if their localisation data is large enough to cover > enough locales for a significant part of the world and theyr want to > interoperate, they will create a defacto standard that can be integrated > (after being proposed to CLDR with enough examplar data and open licencing). > > Such applications already exist (notably across wikis, ven if this still > requires much work to have them cooperate together to stabilize some issues > and agree to some common formats, and efficicently track the translations > problems remaining and how to manage the remaining incoherences, as well as > accepting some deviations for specific uses in more specific pages they > don't want to break). > > > > Le mar. 25 sept. 2018 ? 13:02, Philippe Verdy a > ?crit : > >> >> >> Le mar. 25 sept. 2018 ? 11:32, Marcel Schneider a >> ?crit : >> >>> On 25/09/18 10:00 Philippe Verdy wrote: >>> > Plural rules are documented. These are defined as minimal data needed >>> to start any new locale. >>> >>> That seems to be one of those barriers that Steven is now questioning, >>> or even the main barrier for entry. >>> For me that would remain a barrier as long as I cannot get clear insight >>> nor see straightforward structures to fill in. >>> >>> See the documentation: >> http://cldr.unicode.org/index/cldr-spec/plural-rules >> >> And the supplemental data which gives a list per locale: >> >> http://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Wed Sep 26 09:43:22 2018 From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users) Date: Wed, 26 Sep 2018 16:43:22 +0200 Subject: Locale bringup and barriers for entry In-Reply-To: <1707078068.37.1537930780251.JavaMail.www@wwinf2209> References: <1707078068.37.1537930780251.JavaMail.www@wwinf2209> Message-ID: CLDR does not currently handle octal or hexadecimal formats because those are not in customary use by normal users. They are clearly used by programmers, but that is specialized usage that doesn't require special formatting across human languages. I suggest that people focus on practical issues connected with CLDR and not ramble on about issues that are not particular important to CLDR users. Mark On Wed, Sep 26, 2018 at 5:00 AM Marcel Schneider via CLDR-Users < cldr-users at unicode.org> wrote: > What locales are you referring to? If they are new to CLDR, and you > experienced difficulties in setting up their numbering system, then there > is yet a supplemental barrier. > > > > As far as I can see, I only know Sumerian and Babylonian locales using > sexagesimal numbering. Octal and hexadecimal/tonal as a locale?s numbering > system are discouraged as counterintuitive, as they neither allow people to > count on fingers in a straightforward way, nor to efficiently communicate > digits using hand gestures. More generally, I don?t believe that it could > be useful for a locale to focus on its numbering system in order to get > away from widespread usage. Yes we really do need to make changes, but the > numbering system does in no way appear to me to seem to be in any way the > right end to begin with. Sorry to tell it bluntly, but I?d suggest to focus > on getting all existing locales into CLDR, unlike what is suggested in the > comments I?d pointed in my previous message, and on fixing existing errors. > If any existing living locale does use octal, tonal, sexagesimal, or > whatever non-decimal system beside purely notational conventions like > Roman, then indeed we need to dig deeper into the matter in order to get > them into CLDR. > > > > Having said that, as Steven pointed out, there are already some locales > using algorithmic numbering, as seen in the data: > > > > https://www.unicode.org/repos/cldr/tags/latest/common/bcp47/number.xml > > > https://www.unicode.org/repos/cldr/tags/latest/common/supplemental/numberingSystems.xml > > > > For reference, here is the specification, not very explicit about > algorithmic: > > http://www.unicode.org/reports/tr35/#Numbering%20System%20Data > > > > > > Nevertheless I don?t think that Nystrom was wrong in challenging the > elites of his generation, given the current approach proved to be a slope > into catastrophe, so that today we need to make changes at 180?, or 8 tims > when expressing it in tonal, like those suggested on: > > http://sunsite.monsite-orange.fr/page-5b9e092880342.html > > > > Regards, > > > > Marcel > > > > On 26/09/18 00:43 Philippe Verdy via CLDR-Users wrote: > > > > octal and hexadecimal (as well as binary) are obviously numeric system > using the same digits (or borrowing additional letters or adding other > supplemental digits): the algorithm behind is the same as decimal, it's > just using a different base (not necessarily wrriten each time but infered > from the context), and that algorithm is equally simple, it's basic > arithmetic expressed over a cyclic group. That numeric notation is > contradicted by the way nbumbers are actually spelled in actual languages, > where the base is obviously not just decimal but is using larger bases > (most often 1000 in European traditions, but 100 or 10000 in parts of Asia, > with various exeptions using remainining traces of base 20). Historically, > numbers had mystic or religious traditions, and there remains some old > systems using base 12 (including the old English and Celtic traditions). > > > > Octal and heaxdecimal are certainly modern inventions for technical > reasons (or limitations for and older state-of-the-art technology and costs > of implementations when pure binary system was simply unusable for most > usages; usage of octal is now deprecated, largely replaced by > hexadecimal... except in wellknown programming languages and in old > technical documentations for the oldest computing standards that were never > really deprecated completely to become really out of use or because of > compatibility issues: its support is still mandatory as its also impacts > how these programming languages are parsed into unbreakable lexical tokens: > it would be unpractical to change this basic tokenisation algorithm on > which the rest of the language is built, but a contrario, this is also > limiting the practical adoption of hexadecimal which requires more complex > syntax even if it should be more compact). > > > > Still today, the decimal system is the most widely used, but may be in > solme future, hexadecimal will become popular and translated in actual > languages to express numbers. Then it will be time to have actual > characters added with distinctive forms for the 6 additional digits, > instead of borrowing Latin letters. This could come first from other > languages than those currently using Latin (I think it may appear first in > China, Japan or Korea, as part of the sinographic system or as extensions > of kanas and hangul, and rapidely adopted in South Asia, and once again > European scripts will be the last to accept the change, just as they were > very late in adopting the concept of zero, negative numbers and fractional > decimals using digits, and separators for grouping/decimals). > > > > Yes, I don't see why there's still no hexadecimal extension digits added, > even if today most hexadecimal numbers are used only in technical > programming languages that are standardized only using basic Latin/ASCII. > The barrier is still the adoption also in humane languages for general use, > as well as various legal restrictions (notably for > pricing/billing/accounting/contracting/taxing). There's is less > restrictions in the old legal/judiciary traditions where other systems were > largely in use (and are still !) > > > > > > > > > > Le mar. 25 sept. 2018 ? 23:55, Steven R. Loomis via CLDR-Users < > cldr-users at unicode.org> a ?crit : > > > >> The numbering system is defined in TR 35 in >> https://unicode.org/reports/tr35/tr35-numbers.html#Numbering_Systems in >> terms of either '*numeric*' (which are decimal systems, just >> substituting different digits for "0123456789", such as ???????????? for >> the Vai language, or else *algorithmic* which are more complex rule >> based. I suppose octal and tonal (hexadecimal?!) could be supported by the >> algorithmic approach. >> >> > >> >> > >> >> > >> >> > >> >> > >> On Tue, Sep 25, 2018 at 2:27 PM Luke Dashjr wrote: >> > >> >>> It's been a while since I tried, but I didn't see any possible way to >>> define a >>> > locale's number system (eg, octal or tonal instead of decimal). >>> > >>> > On Saturday 22 September 2018 00:34:27 Steven R. Loomis via CLDR-Users >>> wrote: >>> > > Hello, and welcome to the new cldr-users members. >>> > > >>> > > For discussion: >>> > > >>> > > At the IUC conference last week, a few of us discussed around lunch >>> some >>> > > issues around getting new locales into CLDR, and barriers to entry. >>> > > >>> > > Barriers: >>> > > - we discussed that it could be confusing or difficult to collect >>> all of >>> > > the data needed for a minimal locale: >>> > > http://cldr.unicode.org/index/cldr-spec/minimaldata - especially >>> > > pluralization data >>> > > - what about fonts? keyboards? >>> > > - what are the best ways to coordinate efforts between the language >>> users >>> > > and different technical experts? >>> > > >>> > > Ideas: >>> > > - a web app to take in new locale data? >>> > > - a web app to debug/explore plurals? >>> > > - allowing some locales to 'get started' without plural rules? >>> > > >>> > > Links for discussion: >>> > > - Elnaz and Steven's prez from (last) Monday: https://goo.gl/sN7biw >>> > > - My "full stack" blog post: >>> > > https://srl295.github.io/2017/06/06/full-stack-enablement/ >>> > >> >> _______________________________________________ >> > CLDR-Users mailing list >> > CLDR-Users at unicode.org >> > http://unicode.org/mailman/listinfo/cldr-users >> > > > > > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > > _______________________________________________ > CLDR-Users mailing list > CLDR-Users at unicode.org > http://unicode.org/mailman/listinfo/cldr-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cldr-users at unicode.org Thu Sep 27 00:32:56 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Thu, 27 Sep 2018 07:32:56 +0200 (CEST) Subject: Locale bringup and barriers for entry Message-ID: <1917168749.272.1538026376218.JavaMail.www@wwinf2209> On 26/09/18 16:45 Mark Davis ?? via CLDR-Users wrote: > > CLDR does not currently handle octal or hexadecimal formats because those are not in customary use by normal users. > They are clearly used by programmers, but that is specialized usage that doesn't require special formatting across human languages. > > I suggest that people focus on practical issues connected with CLDR and not ramble on about issues that are not particular important to CLDR users. That is my opinion too, that this thread shouldn?t be abused to discuss issues irrelevant to CLDR. But after having sent many replies after thread launch, all of which intended to help newcomers get started with CLDR, I thought it unfair on my part not to respond to Luke, nor were I going to behave as if I was scared into silence by the new turn of the discussion. The underlying message was: If people want to be disruptive, here?s what I?d suggest to focus on first. But that was not all. I also stated: >> [?] I?d suggest to focus on getting all existing locales into CLDR, > > unlike what is suggested in the comments I?d pointed in my previous message, > > and on fixing existing errors. Sorry for getting off-topic beside that. Regards, Marcel From cldr-users at unicode.org Thu Sep 27 01:00:16 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Thu, 27 Sep 2018 08:00:16 +0200 (CEST) Subject: Locale bringup and barriers for entry In-Reply-To: References: <474792420.10994.1537818695522.JavaMail.www@wwinf2227> <329976032.379.1537854865588.JavaMail.www@wwinf2227> <312813575.3078.1537867951491.JavaMail.www@wwinf2227> Message-ID: <186356847.391.1538028016410.JavaMail.www@wwinf2209> On 26/09/18 16:38 Mark Davis ?? wrote: [quote] > This is due to a misunderstanding of how ordinal works. It is just like cardinal (plural) > in that the translator is responsible for the text, and accounting for gender. > The examples given are thus irrelevant.? [examples] > To reiterate, the handling of grammatical inflections other than plurals/ordinals > is outside the current scope of CLDR, [?] Thank you for this clarification. I?ll take away that CLDR gives hints about which numbers require special handling when being part of messages, but not about how messages are to be inflected depending on the current value of the number placeholder. Is the label ?Minimal Pairs? misleading? Eg Dutch has ordinals one-fits-all, only "other", and a single minimal pair: ?Neem de 15e afslag rechts.? Beside, I wonder whether the -e should be superscript: 'Neem de 15? afslag rechts.' Regards, Marcel From cldr-users at unicode.org Thu Sep 27 05:17:00 2018 From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users) Date: Thu, 27 Sep 2018 12:17:00 +0200 (CEST) Subject: Locale bringup and barriers for entry In-Reply-To: <186356847.391.1538028016410.JavaMail.www@wwinf2209> References: <474792420.10994.1537818695522.JavaMail.www@wwinf2227> <329976032.379.1537854865588.JavaMail.www@wwinf2227> <312813575.3078.1537867951491.JavaMail.www@wwinf2227> <186356847.391.1538028016410.JavaMail.www@wwinf2209> Message-ID: <827563207.3124.1538043420529.JavaMail.www@wwinf2209> > Is the label ?Minimal Pairs? misleading? I?m now seeming able to answer my question: IMO the misconception about what CLDR is supposed to do for ordinals is fueled by the way the data is represented in the charts and in the LDML sources. While English has a comprehensive list of all existing ordinal inflections, French does not, and that seems to be what may make people believe that some data is missing, and that ?the supplemental data is [?] failing.? Mark Davis wrote: > the translator is responsible for the text, and accounting for gender. The examples given are thus irrelevant. So the header should not be ?Minimal Pairs? but just ?Examples? again. As of the provided text, it could be stripped off, and abstract rules be put in its place. That could be even more useful, as demonstrated by the category "special2" in the French example below: Eg for French: Ordinal abbreviation is built by appending default ordinal indicator to the digit. Ordinal 1 has peculiar inflection. Ordinal 2 has peculiar inflection when designating rank. For Italian: Ordinal abbreviation is built by appending default ordinal indicator to the digit. Vowel of article may be elided if number long form starts with a vowel, even if number is short form. For English: Ordinal abbreviation is built by appending default ordinal indicator to the digit. Ordinal 1 has peculiar inflection. Ordinal 2 has peculiar inflection. Ordinal 3 has peculiar inflection. That?s at least what the statements made so far appear to boil down to. But given some of these rules may be lengthy (eg for category "special1" in the Italian example), CLDR may be better off by providing sample text. That?s tricky however, as parsing sample text while being aware of what it is to mean, and what it is not, may be non-obvious. That brings back to what I tried to suggest when arguing in some way that a system of rules is more straigtforward than a collection of samples, especially when provided not for teaching humans, but for informing processes. But given what I?m suggesting to do is to reengineer that part of CLDR, I?ve little hope that anything will be changed. There?s even no need for change if really CLDR users are happy with the actual state of the art. Regards, Marcel