From cldr-users at unicode.org  Tue Sep  4 21:02:56 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Wed, 5 Sep 2018 04:02:56 +0200 (CEST)
Subject: CLDR survey / Polish keyboard (was: Re: CLDR)
Message-ID: <64792931.11196.1536112976805.JavaMail.www@wwinf1h11>

I?m taking this from Unicode Public mailing list, as the topics belong here.
Though I already responded off-list and would prefer stepping out, I?m afraid 
that at least the CLDR part could be really useful in fighting certain baseline 
problems I encountered while being given the opportunity to participate in 
surveying fr-FR locale data for the on-coming v34. Hence I feel committed 
to respond ?on the record? and reopen the door for eventual follow-up, if ever
I could have seemed to close it.

Indeed while there were many errors and flaws in the data, most covetters 
ended up lacking time to completely review all the items, despite doing a 
really great job while devoting many hours to these tasks. After not trying 
to dig deeper so I would have learned what are the issues beneath, I now 
simply speculated on my own about what might have triggered the problems
in reviewing data and ensuring quality.

The goal is to make CLDR data more reliable, and to suggest what vendors 
might wish to do for that purpose.

At top of the below I?ve cut off a snippet unrelated to these topics, and further, 
a snippet for privacy. The slightly blunter off-list wording by contrast has not 
been redacted.
I?ll advise Unicode Public that this thread is moved here.

On 04/09/18 20:10 I wrote:
To: "Janusz S. Bie?" , "James Kass" 
Cc: "Philippe Verdy" 
Subject: [OFF LIST] Re: CLDR
> 
> On 04/09/18 11:11 James Kass via Unicode wrote:
> > (This is the response from Janusz S. Bie? which was sent to the public list.)
> 
> Thank you James for forwarding. I?m responding off-list as I?m afraid that our discussions 
> might not be welcome on the List. [?]
[Deleted for being off-topic.]
> > 
> > On Mon, Sep 03 2018 at 1:03 -0800, James Kass wrote:
> > 
> > > Janusz S. Bie? wrote,
> > >
> [?]
> > Thanks! Most data about Poland at
> > 
> > https://www.wikidata.org/wiki/Q36
> > 
> > seem to make sense, but I don't think anybody is using abbreviation like
> > "plpm" (for Pomorze/Pomerania).
> 
> We can see that part of those codes, for whatever items (regions, languages, scripts)
> are counter-intuitive, and I don?t know neither who is using them in running text.
> 
> > 
> [?]
> > I hope not all CLDR data are driven by Wikidata...
> 
> I was surprised to learn that even more data is imported without review, but Wikidata is
> clearly a more reliable source than ISO 639, that is used without assessing its accuracy.
> 
> > 
> > On Mon, Sep 03 2018 at 12:28 +0200, Marcel Schneider wrote:
> [?]
> > > Then I?m sorry to be off-topic.
> > 
> > Let's say off the original topic. My primary concern is to preserve
> > somehow such comments as e.g. the one on the bottom of page 14 of
> > 
> > https://folk.uib.no/hnooh/mufi/specs/MUFI-CodeChart-4-0.pdf
> 
> Normally this Medieval Latin semicolon abbreviation should be encoded in Unicode, which 
> contains already many duplicates of punctuation marks, and we know that a punctuation can 
> *never* represent a letter without running into issues.
> 
> > 
> [?]
> > > I?m volunteering to personally welcome you to contribute to CLDR.
> > 
> > Thanks. The interesting question is who is/was already contributing from
> > Poland or about Polish language. I vaguely remember a post with this
> > information, but at that time I was not interested enough to take a
> > note.
> 
> I must confess that I wasn?t interested neither, or better, I wasn?t aware that I?m to contribute,
> and perhaps was unable to do so. Normally the vendors, especially Apple and Google, should 
> be well-funded enough to be able to appoint as many specialists as needed. But it turns out 
> that when paying contractors, they are so greedy that the linguists are granted insufficient 
> worktime, eg a certain number of hours, without their managers assessing beforehand what 
> is the status of the data and how much work is needed to fix it, and consequently not able 
> to renegotiate the service provider contract. That operating mode is completely unresponsive 
> on part of Apple and Google, and Microsoft alike (although they have less money to devote).
> 
> > 
> [?]
> > > Polish has
> > > consistently with By-Type, these quotation marks:
> > > ' " ? ? ? ?
> > > Hence the set is incomplete.
> > 
> > You are right, thanks. But was is the practical importance of it?
> 
> The importance of CLDR data being accurate is that having them otherwise would 
> reflect badly on the image of a country as being unable or careless.
> 
> On a general level, another impact of having accurate locale data in CLDR is that 
> the repository gets a better reputation. As long as the data is unreliable, nobody 
> might actually use it.
> 
> Yet another implication of the presence of a character in CLDR is being a good 
> argument for having it on the keyboard layout. Eg the Breton letter apostrophe is
> not yet on the Breton keyboard layout, despite the issue having been discussed 
> on bug tracking / feature request level for XKB. So I informed [?]
[Deleted for privacy.]
> 
> > I noticed that sometimes in Emacs 'forward-word" behaves strangely on a
> > text with unusual characters, but had no motivation to investigate how
> > this is related to the current locale.
> 
> I?m sorry to be unable to check this, as I?m not yet using Emacs, nor Vim.
> 
> [?]
> > 
> > The standard keyboard has a limiting number of keys, so you have to make
> > compromises. It is generally accepted that Polish keyboard layouts
> > (there are primarily two of them) does not contain apostrophe or single
> > quotations marks. There is a proposal by Marcin Woli?ski
> > 
> > http://marcinwolinski.pl/keyboard/
> > 
> > which is available in most Linux distributions but it does not seem
> > popular.
> 
> It has even been ported to Windows. But I cannot find it on Ubuntu 16.04.
> It has various drawbacks, the worst of which is that the most common 
> angle quotation marks ?? are on Shift+AltGr level, while the single ones ?? 
> are on AltGr, and likewise for the curly quotes, of which Polish currently 
> uses the double ones, whereas the single ones appear to be used only 
> for nested quotations. This swapping frequent punctuation and rare punctuation
> has been done only for consistency with the ASCII apostrophe being in the 
> Base shift state, and the ASCII double quote in the Shift shift state as on 
> US-QWERTY. 
> 
> That?s how mnemonics and a certain idea of logic are destroying usability.
> 
> Thanks for the link anyway.
> 
> Best regards,
> 
> Marcel


From cldr-users at unicode.org  Wed Sep  5 07:56:05 2018
From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users)
Date: Wed, 5 Sep 2018 14:56:05 +0200
Subject: CLDR survey / Polish keyboard (was: Re: CLDR)
In-Reply-To: <64792931.11196.1536112976805.JavaMail.www@wwinf1h11>
References: <64792931.11196.1536112976805.JavaMail.www@wwinf1h11>
Message-ID: <CAJ2xs_HXKepkoZX6imprh8fj4cDzZ67HV=T1FZWzw-dYoZvUHg@mail.gmail.com>

The email isn't on a single topic, so I just skimmed. Some quick remarks:

> I hope not all CLDR data are driven by Wikidata...
The Wikidata names are only used for subdivisions, and then only for ones
that are "new" (where there were no preexisting names). The names are
currently not visible via the Survey tool, and thus need modification via
tickets. The reason not to show them in the ST is that it would load the
tool down further and burden the vetters (tripling the number of fields).

> using abbreviation like "plpm"

That isn't an abbreviation, it is a code for a subdivision. Corresponds to
the ISO 3166-2 code PL-PM

> they are so greedy

Ad hominem or (ad societatem) remarks are rarely productive, and rarely an
accurate reflection of reality; one reason I seldom look at
unicode at unicode.org.

Mark


On Wed, Sep 5, 2018 at 4:03 AM Marcel Schneider via CLDR-Users <
cldr-users at unicode.org> wrote:

> I?m taking this from Unicode Public mailing list, as the topics belong
> here.
> Though I already responded off-list and would prefer stepping out, I?m
> afraid
> that at least the CLDR part could be really useful in fighting certain
> baseline
> problems I encountered while being given the opportunity to participate in
> surveying fr-FR locale data for the on-coming v34. Hence I feel committed
> to respond ?on the record? and reopen the door for eventual follow-up, if
> ever
> I could have seemed to close it.
>
> Indeed while there were many errors and flaws in the data, most covetters
> ended up lacking time to completely review all the items, despite doing a
> really great job while devoting many hours to these tasks. After not
> trying
> to dig deeper so I would have learned what are the issues beneath, I now
> simply speculated on my own about what might have triggered the problems
> in reviewing data and ensuring quality.
>
> The goal is to make CLDR data more reliable, and to suggest what vendors
> might wish to do for that purpose.
>
> At top of the below I?ve cut off a snippet unrelated to these topics, and
> further,
> a snippet for privacy. The slightly blunter off-list wording by contrast
> has not
> been redacted.
> I?ll advise Unicode Public that this thread is moved here.
>
> On 04/09/18 20:10 I wrote:
> To: "Janusz S. Bie?" , "James Kass"
> Cc: "Philippe Verdy"
> Subject: [OFF LIST] Re: CLDR
> >
> > On 04/09/18 11:11 James Kass via Unicode wrote:
> > > (This is the response from Janusz S. Bie? which was sent to the public
> list.)
> >
> > Thank you James for forwarding. I?m responding off-list as I?m afraid
> that our discussions
> > might not be welcome on the List. [?]
> [Deleted for being off-topic.]
> > >
> > > On Mon, Sep 03 2018 at 1:03 -0800, James Kass wrote:
> > >
> > > > Janusz S. Bie? wrote,
> > > >
> > [?]
> > > Thanks! Most data about Poland at
> > >
> > > https://www.wikidata.org/wiki/Q36
> > >
> > > seem to make sense, but I don't think anybody is using abbreviation
> like
> > > "plpm" (for Pomorze/Pomerania).
> >
> > We can see that part of those codes, for whatever items (regions,
> languages, scripts)
> > are counter-intuitive, and I don?t know neither who is using them in
> running text.
> >
> > >
> > [?]
> > > I hope not all CLDR data are driven by Wikidata...
> >
> > I was surprised to learn that even more data is imported without review,
> but Wikidata is
> > clearly a more reliable source than ISO 639, that is used without
> assessing its accuracy.
> >
> > >
> > > On Mon, Sep 03 2018 at 12:28 +0200, Marcel Schneider wrote:
> > [?]
> > > > Then I?m sorry to be off-topic.
> > >
> > > Let's say off the original topic. My primary concern is to preserve
> > > somehow such comments as e.g. the one on the bottom of page 14 of
> > >
> > > https://folk.uib.no/hnooh/mufi/specs/MUFI-CodeChart-4-0.pdf
> >
> > Normally this Medieval Latin semicolon abbreviation should be encoded in
> Unicode, which
> > contains already many duplicates of punctuation marks, and we know that
> a punctuation can
> > *never* represent a letter without running into issues.
> >
> > >
> > [?]
> > > > I?m volunteering to personally welcome you to contribute to CLDR.
> > >
> > > Thanks. The interesting question is who is/was already contributing
> from
> > > Poland or about Polish language. I vaguely remember a post with this
> > > information, but at that time I was not interested enough to take a
> > > note.
> >
> > I must confess that I wasn?t interested neither, or better, I wasn?t
> aware that I?m to contribute,
> > and perhaps was unable to do so. Normally the vendors, especially Apple
> and Google, should
> > be well-funded enough to be able to appoint as many specialists as
> needed. But it turns out
> > that when paying contractors, they are so greedy that the linguists are
> granted insufficient
> > worktime, eg a certain number of hours, without their managers assessing
> beforehand what
> > is the status of the data and how much work is needed to fix it, and
> consequently not able
> > to renegotiate the service provider contract. That operating mode is
> completely unresponsive
> > on part of Apple and Google, and Microsoft alike (although they have
> less money to devote).
> >
> > >
> > [?]
> > > > Polish has
> > > > consistently with By-Type, these quotation marks:
> > > > ' " ? ? ? ?
> > > > Hence the set is incomplete.
> > >
> > > You are right, thanks. But was is the practical importance of it?
> >
> > The importance of CLDR data being accurate is that having them otherwise
> would
> > reflect badly on the image of a country as being unable or careless.
> >
> > On a general level, another impact of having accurate locale data in
> CLDR is that
> > the repository gets a better reputation. As long as the data is
> unreliable, nobody
> > might actually use it.
> >
> > Yet another implication of the presence of a character in CLDR is being
> a good
> > argument for having it on the keyboard layout. Eg the Breton letter
> apostrophe is
> > not yet on the Breton keyboard layout, despite the issue having been
> discussed
> > on bug tracking / feature request level for XKB. So I informed [?]
> [Deleted for privacy.]
> >
> > > I noticed that sometimes in Emacs 'forward-word" behaves strangely on a
> > > text with unusual characters, but had no motivation to investigate how
> > > this is related to the current locale.
> >
> > I?m sorry to be unable to check this, as I?m not yet using Emacs, nor
> Vim.
> >
> > [?]
> > >
> > > The standard keyboard has a limiting number of keys, so you have to
> make
> > > compromises. It is generally accepted that Polish keyboard layouts
> > > (there are primarily two of them) does not contain apostrophe or single
> > > quotations marks. There is a proposal by Marcin Woli?ski
> > >
> > > http://marcinwolinski.pl/keyboard/
> > >
> > > which is available in most Linux distributions but it does not seem
> > > popular.
> >
> > It has even been ported to Windows. But I cannot find it on Ubuntu 16.04.
> > It has various drawbacks, the worst of which is that the most common
> > angle quotation marks ?? are on Shift+AltGr level, while the single ones
> ??
> > are on AltGr, and likewise for the curly quotes, of which Polish
> currently
> > uses the double ones, whereas the single ones appear to be used only
> > for nested quotations. This swapping frequent punctuation and rare
> punctuation
> > has been done only for consistency with the ASCII apostrophe being in
> the
> > Base shift state, and the ASCII double quote in the Shift shift state as
> on
> > US-QWERTY.
> >
> > That?s how mnemonics and a certain idea of logic are destroying
> usability.
> >
> > Thanks for the link anyway.
> >
> > Best regards,
> >
> > Marcel
>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180905/f568b18e/attachment.html>

From cldr-users at unicode.org  Wed Sep  5 11:22:55 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Wed, 5 Sep 2018 18:22:55 +0200 (CEST)
Subject: CLDR survey / Polish keyboard (was: Re: CLDR)
Message-ID: <954616513.15222.1536164575599.JavaMail.www@wwinf1m11>

On 05/09/18 14:59 Mark Davis ?? via CLDR-Users wrote:
> 
[?]
> they are so greedy
>
> Ad hominem or (ad societatem) remarks are rarely productive, and rarely an accurate reflection of reality;

That is the one phrase I?d redacted first if I was to remove off-list-style shorthand topoi. 
I was afraid that it could hurt when posted here, while I only wished to make aware of 
the way management decisions may end up reflecting badly on whatever corporate image.

The idea is that CLDR data shouldn?t be to wait for a volunteer coming along to correct.
Rather the process should be set up in a way it succeeds in say 2 years from scratch.

Now we?re to determine whether the (human and financial) effort implied is not considered
worthwile. That could be because end-users getting inaccurate data displayed are not deemed
to pay attention; or because public language offices are the premium contributers expected, 
and vendors are only helping out at failure. [Here I?m censoring myself so as not to get ad
corpus again, nor ad hominem as I did necessarily off-list when giving details about a contact
with a language office.]

Perhaps the most useful thing would be to simply send e-mails to vendors asking them to devote 
more means to CLDR survey, making aware that the data isn?t meeting obvious quality standards.

Is it naive to believe that an e-mail to this or the other list may suffice for that purpose?

> one reason I seldom look at unicode at unicode.org.

I publicly apologize for any ad hominem comment I?d ever posted on a list. I sincerely regret not 
to stay technical, having trouble depersonalizing human affairs. I?m always at risk of getting off 
the road while trying to understand and to figure out how and by whom problems could be fixed.
Perhaps I shouldn?t focus on that. Probably I?d better just do the job as it lies out.

Eg when a correctly spelled name was suddenly misspelled despite a vetter hinting that the name
was correct, there would be no point in finding out how that could happen, but only in correcting 
the error (two years later). But the evidence is that such things can happen only because vetters 
are not given enough time to assess a spelling as accurate. Eg by that time it was already sufficient 
to look up the proposed spelling in French Wikip?dia, for getting a sentence in the first place 
explaining why that spelling does not apply. 

Even now, a number of errors remained uncorrected because vetters did not have enough worktime 
while survey was plain open. I myself ended up cutting down CLDR survey time while corrections 
didn?t get an echo and it was unclear whether they were useful, and this way left typos I?d made in ST.
When vetting phase was on, everybody did a great job but it was too late to correct the typos, given 
ST is partly read-only then, which from my point of view is not good. But no matter, I?d made and left 
the typos.

Best regards,

Marcel


From cldr-users at unicode.org  Wed Sep  5 14:03:28 2018
From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users)
Date: Wed, 5 Sep 2018 21:03:28 +0200
Subject: CLDR survey / Polish keyboard
In-Reply-To: <86zhwwhvpm.fsf@mimuw.edu.pl>
References: <64792931.11196.1536112976805.JavaMail.www@wwinf1h11>
 <CAJ2xs_HXKepkoZX6imprh8fj4cDzZ67HV=T1FZWzw-dYoZvUHg@mail.gmail.com>
 <86zhwwhvpm.fsf@mimuw.edu.pl>
Message-ID: <CAJ2xs_GkvGtvQ9ED+_dM1bmM9jxRxWFC4zrWS3sQwNu6L2XW4Q@mail.gmail.com>

This one would be more useful if it were more complete, and the data were
managed better. But there are far less useful ISO standards!

Mark


On Wed, Sep 5, 2018 at 4:20 PM Janusz S. Bie? <jsbien at mimuw.edu.pl> wrote:

> On Wed, Sep 05 2018 at 14:56 +0200, Mark Davis ?? wrote:
> > The email isn't on a single topic, so I just skimmed. Some quick remarks:
> >
> >> I hope not all CLDR data are driven by Wikidata...
> > The Wikidata names are only used for subdivisions, and then only for
> > ones that are "new" (where there were no preexisting names). The names
> > are currently not visible via the Survey tool, and thus need
> > modification via tickets.  The reason not to show them in the ST is
> > that it would load the tool down further and burden the vetters
> > (tripling the number of fields).
> >
> >> using abbreviation like "plpm"
> >
> > That isn't an abbreviation, it is a code for a
> > subdivision. Corresponds to the ISO 3166-2 code PL-PM
>
> Thanks for explanation. I found the probably full list of codes for
> Poland here:
>
> https://pl.wikipedia.org/wiki/ISO_3166-2:PL
>
> Still in doubt whether the codes are of any use, but we have to live
> with it. Some time ago, as a member of a technical committee of the
> Polish Committee for Standardization I tried to block an ISO standard of
> no practical use, and it appeared completely impossible...
>
> Best regards
>
> Janusz
>
> --
>              ,
> Janusz S. Bien
> emeryt (emeritus)
> https://sites.google.com/view/jsbien
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180905/64091161/attachment.html>

From cldr-users at unicode.org  Fri Sep  7 19:49:26 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Sat, 8 Sep 2018 02:49:26 +0200 (CEST)
Subject: Shortcuts question (Re)
Message-ID: <1879781969.16428.1536367766562.JavaMail.www@wwinf1m21>

Hello,

There is a short thread about localizing keyboard shortcuts, on Unicode Public:

https://unicode.org/mail-arch/unicode-ml/y2018-m09/0018.html

https://unicode.org/mail-arch/unicode-ml/y2018-m09/0019.html

https://unicode.org/mail-arch/unicode-ml/y2018-m09/0021.html

On Fri, 7 Sep 2018 05:52:46 +0530 Shriramana Sharma via Unicode wrote:
[?]
> 1) Are shortcuts like Ctrl+C changed as per locale? I mean Ctrl+T for "tout" io Ctrl+A for "all"?

On Fri, 7 Sep 2018 05:27:08 +0200 I via Unicode wrote:
> No, Ctrl+A remains Ctrl+A on a French keyboard.

On Fri, 7 Sep 2018 15:03:46 +0200 Christoph P?per via Unicode wrote:
[?]
> Some are, many are not. For instance, some text editors use a modifier key with F and K
> instead of B and I for bold ("fett") and italic ("kursiv").

Indeed in French edition of Excel Starter, bold is Ctrl+G (for ?gras?), while Word Starter (as part of 
the same Office Starter) has it Ctrl+B.

For follow-up, here is OP?s full request:
On Fri, 7 Sep 2018 05:52:46 +0530 Shriramana Sharma via Unicode wrote:

Hello. This may be slightly OT for this list but I'm asking it here as it concerns computer usage with multiple scripts and i18n:

1) Are shortcuts like Ctrl+C changed as per locale? I mean Ctrl+T for "tout" io Ctrl+A for "all"?

2) How about when the shortcuts are the Alt+ combinations referring to underlined letters in actual user visible strings?

3) In a QWERTZ layout for Undo should one still press the (dislocated wrt the other XCV shortcuts) Z key or the Y key which is in the physical position of the QWERTY Z key (and close to the other XCV shortcuts)?

4) How are shortcuts handled in the case of non Latin keyboards like Cyrillic or Japanese?

4a) I mean how are they displayed on screen? 

4b) Like #1 above, are they changed per language?

4c) Like #2 above, how about for user visible shortcuts?

(In India since English is an associate official language, most computer users are at least conversant with basic English so we use the English/QWERTY shortcuts even if the keyboard physically shows an Indic script.)

Thanks!


From cldr-users at unicode.org  Thu Sep 13 13:57:39 2018
From: cldr-users at unicode.org (Peter Edberg via CLDR-Users)
Date: Thu, 13 Sep 2018 11:57:39 -0700
Subject: Unicode CLDR 34 alpha available for testing
Message-ID: <193FF3F2-8F69-496D-AD2E-8A558B0731EA@unicode.org>

The alpha version of Unicode CLDR 34 <http://cldr.unicode.org/index/downloads/cldr-34> is available for testing. The alpha period lasts until the beta release on September 26, which will include updates to the LDML spec. The final release is expected on October 10.

CLDR 34 provides an update to the key building blocks for software supporting the world's languages. This data is used by all major software systems <http://cldr.unicode.org/index#TOC-Who-uses-CLDR> for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

CLDR 34 included a full Survey Tool data collection phase. Other enhancements include several changes to prepare for the new Japanese calendar era starting 2018-05-01; updated emoji names, annotations, collation and grouping; and other specific fixes. The draft release page at  <http://cldr.unicode.org/index/downloads/cldr-34>http://cldr.unicode.org/index/downloads/cldr-3 <http://cldr.unicode.org/index/downloads/cldr-34>4 lists the major features, and has pointers to the newest data and charts. It will be fleshed out over the coming weeks with more details, migration issues, known problems, and so on. Particularly useful for review are:

Delta Charts <http://unicode.org/cldr/charts/34/delta/index.html> - the data that changed during the release
By-Type Charts <http://unicode.org/cldr/charts/34/by_type/index.html> - a side-by-side comparison of data from different locales
Annotation Charts <http://unicode.org/cldr/charts/34/annotations/index.html> - new emoji names and keywords
Please report any problems that you find using a CLDR ticket <http://unicode.org/cldr/trac/newticket>. We'd also appreciate it if programmatic users of CLDR data download the xml files and do a trial integration to see if any problems arise.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180913/288ad743/attachment.html>

From cldr-users at unicode.org  Thu Sep 13 14:36:16 2018
From: cldr-users at unicode.org (Peter Edberg via CLDR-Users)
Date: Thu, 13 Sep 2018 12:36:16 -0700
Subject: Unicode CLDR 34 alpha available for testing
In-Reply-To: <193FF3F2-8F69-496D-AD2E-8A558B0731EA@unicode.org>
References: <193FF3F2-8F69-496D-AD2E-8A558B0731EA@unicode.org>
Message-ID: <9CC2B93E-8A6A-4AAB-8420-C94679CB5F59@unicode.org>

> On Sep 13, 2018, at 11:57 AM, Peter Edberg via Unicore <unicore at unicode.org> wrote:
> 
> The alpha version of Unicode CLDR 34 <http://cldr.unicode.org/index/downloads/cldr-34> is available for testing. The alpha period lasts until the beta release on September 26, which will include updates to the LDML spec. The final release is expected on October 10.
> 
> CLDR 34 provides an update to the key building blocks for software supporting the world's languages. This data is used by all major software systems <http://cldr.unicode.org/index#TOC-Who-uses-CLDR> for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.
> 
> CLDR 34 included a full Survey Tool data collection phase. Other enhancements include several changes to prepare for the new Japanese calendar era starting 2018-05-01;
> 
(that of course should have read 2019-05-01, sorry)
> updated emoji names, annotations, collation and grouping; and other specific fixes. The draft release page at  <http://cldr.unicode.org/index/downloads/cldr-34>http://cldr.unicode.org/index/downloads/cldr-3 <http://cldr.unicode.org/index/downloads/cldr-34>4 lists the major features, and has pointers to the newest data and charts. It will be fleshed out over the coming weeks with more details, migration issues, known problems, and so on. Particularly useful for review are:
> 
> Delta Charts <http://unicode.org/cldr/charts/34/delta/index.html> - the data that changed during the release
> By-Type Charts <http://unicode.org/cldr/charts/34/by_type/index.html> - a side-by-side comparison of data from different locales
> Annotation Charts <http://unicode.org/cldr/charts/34/annotations/index.html> - new emoji names and keywords
> Please report any problems that you find using a CLDR ticket <http://unicode.org/cldr/trac/newticket>. We'd also appreciate it if programmatic users of CLDR data download the xml files and do a trial integration to see if any problems arise.
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180913/fc8923ec/attachment.html>

From cldr-users at unicode.org  Thu Sep 13 19:22:53 2018
From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users)
Date: Thu, 13 Sep 2018 17:22:53 -0700
Subject: Unicode CLDR 34 alpha available for testing
In-Reply-To: <193FF3F2-8F69-496D-AD2E-8A558B0731EA@unicode.org>
References: <193FF3F2-8F69-496D-AD2E-8A558B0731EA@unicode.org>
Message-ID: <CAJ2xs_GozWpcre_pLMPsJYMkzmuUdFXje=igENGg_Fgpru=D+Q@mail.gmail.com>

I touched up the release page, adding some of your wording. See how it
looks:

http://cldr.unicode.org/index/downloads/cldr-34

Mark


On Thu, Sep 13, 2018 at 11:58 AM Peter Edberg via CLDR-Users <
cldr-users at unicode.org> wrote:

> The alpha version of Unicode CLDR 34
> <http://cldr.unicode.org/index/downloads/cldr-34> is available for
> testing. The alpha period lasts until the beta release on September 26,
> which will include updates to the LDML spec. The final release is expected
> on October 10.
>
> CLDR 34 provides an update to the key building blocks for software
> supporting the world's languages. This data is used by all major software
> systems <http://cldr.unicode.org/index#TOC-Who-uses-CLDR> for their
> software internationalization and localization, adapting software to the
> conventions of different languages for such common software tasks.
>
> CLDR 34 included a full Survey Tool data collection phase. Other
> enhancements include several changes to prepare for the new Japanese
> calendar era starting 2018-05-01; updated emoji names, annotations,
> collation and grouping; and other specific fixes. The draft release page at
> <http://cldr.unicode.org/index/downloads/cldr-34>
> http://cldr.unicode.org/index/downloads/cldr-3
> <http://cldr.unicode.org/index/downloads/cldr-34>4 lists the major
> features, and has pointers to the newest data and charts. It will be
> fleshed out over the coming weeks with more details, migration issues,
> known problems, and so on. Particularly useful for review are:
>
>    - Delta Charts <http://unicode.org/cldr/charts/34/delta/index.html> -
>    the data that changed during the release
>    - By-Type Charts <http://unicode.org/cldr/charts/34/by_type/index.html> -
>    a side-by-side comparison of data from different locales
>    - Annotation Charts
>    <http://unicode.org/cldr/charts/34/annotations/index.html> - new emoji
>    names and keywords
>
> Please report any problems that you find using a CLDR ticket
> <http://unicode.org/cldr/trac/newticket>. We'd also appreciate it if
> programmatic users of CLDR data download the xml files and do a trial
> integration to see if any problems arise.
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180913/803d3496/attachment.html>

From cldr-users at unicode.org  Sun Sep 16 16:42:53 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Sun, 16 Sep 2018 23:42:53 +0200 (CEST)
Subject: Shortcuts question
Message-ID: <1968466874.9623.1537134173094.JavaMail.www@wwinf1m12>

On 16/09/18 15:28, Philippe Verdy wrote on Unicode Public Mail List:
[?]
> On PC keyboards, ShiftLock does not apply to the numeric pad which has its separate NumLock, now largely redundant
> and that most users would like to disable completely each time there's a numeric pad separated from the directional pad,
> on these extended keyboards, NumLock is just a nuisance, notably on OS logon screen when Windows turns it off by default
> unless the BIOS locks it at boot time, and lot of BIOS don't do that or don't have the option to set it permanently).

Legacy NumLock can be permanently disabled on a per-layout basis by hard-coding additional defines in the header file, 
given that since a long time, arrow keys are present throughout, while the numpad is either separated, or integrated, or missing, 
and may be external. But a number of laptops having integrated numpad (on alphanumeric keys on and beneath 7 8 9 0) are using
NumLock as a combined legacy NumLock and Fn-Lock-on-Numpad. Here disabling the legacy part is particularly useful, as this 
does not affect the Fn-Lock-on-Numpad functionality. The result is alternative access to integrated numpad digits either by holding 
down Fn, or by activating the NumLock toggle. Subscribers interested in details may wish to follow up off-list with us.

Further, on 16/09/18 14:18, I wrote on Unicode Public:
[?]
> But again that is easier on Windows, where VKs are remapped separately, than on Linux that 
> appears to use graphics throughout to process application shortcuts, and only modifiers can be "preserved" for
> further processing, no underlying letter map that AFAIU appears not to exist on Linux.

I was wrong. Linux allows to map Control modifier combinations to letters eg on levels 7 and 8 while directing XKB 
to preserve the modifiers, enabling Linux to have keyboard shortcuts moving around independently from default resolution 
(that uses letter mapping on Latin layouts, while other scripts appear to benefit from an internal QWERTY mapping).
Again, people interested in how to code that are welcome to follow up off-list.

Regards,

Marcel


From cldr-users at unicode.org  Mon Sep 17 09:58:43 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Mon, 17 Sep 2018 16:58:43 +0200 (CEST)
Subject: Group separator migration from U+00A0 to U+202F
Message-ID: <1724680793.12587.1537196323824.JavaMail.www@wwinf1m12>

To be cost-effective, all locales using space as numbers group separator should migrate 
at once from the wrong U+00A0 to the correct U+202F. 

I didn?t aim at making French stand out, but at correcting an error in CLDR. 
Having even the Canadian French sublocale stick with the wrong value makes no sense 
and is mainly due to opaque inheritance relationships and to severe constraints on vetters 
applying for fr-FR and subsequently reduced to look on helpless from the sidelines when 
sublocales are not getting fixed.

http://cldr.unicode.org/index/downloads/cldr-34#TOC-Migration

https://unicode.org/cldr/trac/ticket/11423

Regards,

Marcel


From cldr-users at unicode.org  Tue Sep 18 00:17:50 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Tue, 18 Sep 2018 07:17:50 +0200 (CEST)
Subject: Group separator migration from U+00A0 to U+202F
Message-ID: <1696775949.245.1537247870482.JavaMail.www@wwinf2219>

> I didn?t aim at making French stand out, but at correcting an error in CLDR. 

So I've to confess that I did focus on French and only applied for fr-FR, but 
there was a lot of work, see http://cldr.unicode.org/index/downloads/cldr-34#TOC-Growth
waiting for very few vetters. Nevertheless I also cared for English (see various tickets), 
and also posted on CLDR-users in a belated P.S. that fr-CA hadn?t caught up the group 
separator correction yet:
https://unicode.org/pipermail/cldr-users/2018-August/000825.html

Also I?m sorry for failing to provide appropriate feedback after beta release and to post 
upstream messages urging to make sure all locales using space for group separator be 
kept in synchrony.

I think the point about not splitting up all the data into locales is a very good one.

There should be a common pool so that all locales using Arabic script have automatically
group separator set to ARABIC THOUSANDS SEPARATOR (provided it actually fits all), 
and those locales using space should only need to specify "space" to automatically get 
the correct one, ie NARROW NO-BREAK SPACE as soon as Unicode is ready to give it 
currency in that role.

Also there is a display issue in the charts, where whitespaces show up as what they are:
blanks, regardless whether they are wide or narrow, justifying or fixed-width.
Non-breaking behavior may be induced from context, but we see that other correct 
behavior cannot be induced from context, given numbers were supposed to be grouped 
using a justifying space, so that it only works halfway where justification is turned off (eg 
in Wikipedia).

I?m posting here thinking at people not monitoring Trac:
https://unicode.org/cldr/trac/ticket/11423#comment:2

Regards,

Marcel


From cldr-users at unicode.org  Fri Sep 21 19:34:27 2018
From: cldr-users at unicode.org (Steven R. Loomis via CLDR-Users)
Date: Fri, 21 Sep 2018 17:34:27 -0700
Subject: Locale bringup and barriers for entry
Message-ID: <CAFYQx+AkdwP_piqWDd4k4VAaOq7jneXDLpGVfu8scOELjedNyA@mail.gmail.com>

Hello, and welcome to the new cldr-users members.

For discussion:

At the IUC conference last week, a few of us discussed around lunch some
issues around getting new locales into CLDR, and barriers to entry.

Barriers:
- we discussed that it could be confusing or difficult to collect all of
the data needed for a minimal locale:
http://cldr.unicode.org/index/cldr-spec/minimaldata - especially
pluralization data
- what about fonts? keyboards?
- what are the best ways to coordinate efforts between the language users
and different technical experts?

Ideas:
- a web app to take in new locale data?
- a web app to debug/explore plurals?
- allowing some locales to 'get started' without plural rules?

Links for discussion:
- Elnaz and Steven's prez from (last) Monday: https://goo.gl/sN7biw
- My "full stack" blog post:
https://srl295.github.io/2017/06/06/full-stack-enablement/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180921/90a9038f/attachment.html>

From cldr-users at unicode.org  Sat Sep 22 03:15:09 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Sat, 22 Sep 2018 10:15:09 +0200 (CEST)
Subject: Locale bringup and barriers for entry
Message-ID: <695051747.1133.1537604109774.JavaMail.www@wwinf2227>

Thank you Steven for sharing these useful resources and for the effort you and others undertook 
in vulgarizing some insights about what is CLDR, what is locale data, and how to bring these together.
?
To start discussion, here are a few thoughts crossing my mind based on experience of past survey round:

On 22/09/18 02:37 Steven R. Loomis via CLDR-Users wrote:
> 
> Hello, and welcome to the new cldr-users members.

Thanks.

> For discussion:
>
> At the IUC conference last week, a few of us discussed around lunch some issues around getting new locales into CLDR, and barriers to entry.
> Barriers:
> - we discussed that it could be confusing or difficult to collect all of the data needed for a minimal locale:

Some main sources of confusion seem to me:
1. The English template may not be internally consistent, eg emoji category names may be singular or plural (plural throughout seems correct);
2. The English template may not be up-to-date, eg. still including ASCII quotes in exemplar punctuation though these have been ruled out;
3. The target data sets may not be comprehensively specified, eg the define of exemplar punctuation does include an exclusion clause for math
?????symbols only, while the clause about not including symbols on a programmatic usage basis such as # @ _ is still missing;
4. The English template may not be kept in synchrony with the specifications, eg emoji keywords not to include emoji name or name starter;
5. Numerous bugs affecting markup of inherited values (but these have been reported and are about to be fixed in the SurveyTool code).

>?http://cldr.unicode.org/index/cldr-spec/minimaldata - especially pluralization data

The scope of pluralization seems unclear and biased by the English paradigm of genderlessness, while in other languages grammatical gender
is a determining parameter for pluralization, so that even extensions to the DTD seem to be required for providing out-of-the-box pluralization rules.

> - what about fonts?

Invisibles and confusables should be visualized and distinguished throughout, ie both in SurveyTool and in Charts. While SurveyTool already shows
U+200E LEFT-TO-RIGHT MARK and U+200F RIGHT-TO-LEFT MARK, confusables like spaces and apostrophes are still hard or not to distinguish.
That?s in the nature of the related charactes, eg U+00A0 NO-BREAK SPACE is defined as being like U+0020 SPACE except for line-break behavior, 
and the preferred glyph of U+02BC MODIFIER LETTER APOSTROPHE is the same as that of U+2019 RIGHT SINGLE QUOTATION MARK.

> keyboards?

I see fonts and keyboards actually as the two missing components of the stack that you designed, because though being part of locale data, input
methods are a precondition of efficient submission of locale data. The full stack would thus expand to:

1. Encoding
2. Fonts
3. Input methods
4. Locale data

> - what are the best ways to coordinate efforts between the language users and different technical experts?
> Ideas:
> - a web app to take in new locale data?

I think CLDR has already its web app, ie SurveyTool. A full-time engineer is actually redeveloping and debugging several or all parts of it.

> - a web app to debug/explore plurals?

Before including this functionality in SurveyTool, where it belongs in, I think that the spec should be redesigned, and the documentation updated 
accordingly. That could eventually result in extended language support by CLDR/ICU, which would do no harm but only raise the product value.

> - allowing some locales to 'get started' without plural rules?

I think that any locale may get started in CLDR when providing date and time formats, while correctly displaying a reminder of a shopping cart 
may be left over for a later stage.

> Links for discussion:
> - Elnaz and Steven's prez from (last) Monday:?https://goo.gl/sN7biw
> - My "full stack" blog post:?https://srl295.github.io/2017/06/06/full-stack-enablement/

Thanks. Have read and discussed following the hints you provided.

Regards,

Marcel


_______________________________________________
CLDR-Users mailing list
CLDR-Users at unicode.org
http://unicode.org/mailman/listinfo/cldr-users


From cldr-users at unicode.org  Sat Sep 22 06:07:29 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Sat, 22 Sep 2018 13:07:29 +0200 (CEST)
Subject: Locale bringup and barriers for entry
Message-ID: <1129835762.2183.1537614449027.JavaMail.www@wwinf2227>

I didn?t aim at doing what I?ve ended up doing, ie summing up a bunch of tickets already under process 
in a thread launched to welcome newcomers and showing ways of expanding CLDR support to all of 
the world?s locales.

Indeed over the details I forgot my first thoughts:

On 22/09/18 02:37 Steven R. Loomis via CLDR-Users wrote:
[?]
> - what are the best ways to coordinate efforts between the language users and different technical experts?

I can only encourage everyone to first make up our minds individually by taking a close look at the latest Charts, 
especially ? as of learning how to include *new* locales ? at the By-Type overviews of the set of locales that 
have already had the chance of making it into CLDR:

http://cldr.unicode.org/index/downloads

http://www.unicode.org/cldr/charts/latest/

https://www.unicode.org/cldr/charts/latest/by_type/index.html

https://www.unicode.org/cldr/charts/latest/by_type/core_data.alphabetic_information.html

https://www.unicode.org/cldr/charts/latest/by_type/core_data.alphabetic_information.main.html

https://www.unicode.org/cldr/charts/latest/by_type/core_data.alphabetic_information.punctuation.html

and so on.

Another important step is to read through the Information Hub for Linguists, the main documentation resource:

http://cldr.unicode.org/translation

from where we can access the detailed pages linked also from the information pane in SurveyTool.
Eg about plurals:

http://cldr.unicode.org/translation/plurals

I happened to start uninformed discussions prior to noticing that the documentation already provided 
sufficient instructions, or prior to sorting out what was already covered or what clarifications I needed?

A good way to prepare ? if not already done ? is also to learn XML and more specifically LDML, the 
Unicode Locale Data Markup Language, in order to be able to read and submit data in that format:

http://cldr.unicode.org/index/cldr-spec

linking:

http://www.unicode.org/reports/tr35/

Eg to understand how inheritance works:

http://www.unicode.org/reports/tr35/#Locale_Inheritance

That is key knowledge to understand what happens to us when working in SurveyTool, 
and to detect eventual inheritance display bugs ? unlikely to happen anymore, though.

Now we?re ready for a take on the raw data, as downloaded or found in the online repository:

http://www.unicode.org/repos/cldr/tags/latest/
https://www.unicode.org/repos/cldr/tags/latest/common/
https://www.unicode.org/repos/cldr/tags/latest/common/main/

where we may wish to pick the locale that is closest to our new data, or that we know best among 
the precursors, or simply English for reference:

https://www.unicode.org/repos/cldr/tags/latest/common/main/en.xml

(Emoji-related data are in a separate repository:
https://www.unicode.org/repos/cldr/tags/latest/common/annotations/en.xml
)

I think best is to download a whole set of data in a zipped folder?; latest as of now are in:

http://www.unicode.org/Public/cldr/33.1/

and then open relevant files in a text editor with syntaxic highlighting and XML syntax checker.
Here?s finally my answer to the quoted question about how to coordinate efforts between users and experts:
All interested people may communicate by any available means all over the year, given SurveyTool fora have 
limited access and accept posts only during surveys, while being read-only for accredited people the rest of
the time. Likewise, SurveyTool submission forms are read-only except during relatively short windows of 
opportunity extending over 4..7 weeks two times a year.

Results of discussions may then be committed to a file in LDML/XML format. The easiest way is to take 
the English files, cut off eventually unreviewed parts, and replace English content with locale content.

The resulting files may then be submitted individually by each coordinated vetter using 
the SurveyTool bulk data upload feature:

http://cldr.unicode.org/index/survey-tool
http://cldr.unicode.org/index/survey-tool/guide
http://cldr.unicode.org/index/survey-tool/guide#TOC-Advanced-Features
http://cldr.unicode.org/index/survey-tool/upload

I think we?ll look whether we?ll try this out for French / fr-FR when the next rush starts on December 1??.

Good luck!

Marcel


From cldr-users at unicode.org  Sat Sep 22 10:17:24 2018
From: cldr-users at unicode.org (Philippe Verdy via CLDR-Users)
Date: Sat, 22 Sep 2018 17:17:24 +0200
Subject: Locale bringup and barriers for entry
In-Reply-To: <1129835762.2183.1537614449027.JavaMail.www@wwinf2227>
References: <1129835762.2183.1537614449027.JavaMail.www@wwinf2227>
Message-ID: <CAGa7JC0QtFBntP-j=+cg9K38zJKEsW6hsKBArYPtOCpamwRdjQ@mail.gmail.com>

Le sam. 22 sept. 2018 ? 13:10, Marcel Schneider via CLDR-Users <
cldr-users at unicode.org> a ?crit :

> A good way to prepare ? if not already done ? is also to learn XML and
> more specifically LDML, the
> Unicode Locale Data Markup Language, in order to be able to read and
> submit data in that format:
>
> http://cldr.unicode.org/index/cldr-spec


But the CLDR Survey does not allow us to participate directly by submitting
LDML data (as submitted but unvetted provisional data) and then merging
them in our votes. It would considerably speedup the data submission (I
also think that such submissions should allow us to include some custom
"hashtag", that are usable in a search form in CLDR survey, so that we can
group related items together: we could have multiple hashtags, one for each
property we want to track in alternative groups, because these groups do
not necessarily form a partition of the data space, they are not
orthogonal).

Adding hashtags would also be possible in LDML for the whole LDML file or
parts of it. basicacally they would have the syntax of a space separated
list of keywords, themsevles not translated but used symbolically, and
using preferably a naming convention (CLDR admins could rename them in case
of collision, but this would not change the naming; if hashtags come from
user submission, they could be automatically prefixed by an identifier of
that user or organisation, such as "x-24-space" when user number 24
submitted data with a custom tag "space"; but the CLDR admin team could
create tags more freely without this prefix).

Any data time in CLDR could have one or several tags. These tags by default
would be visible only to that submitting user, separated from tags shared
and exposed to others (so that we can separate custom groups created by
users from groups to be reused by other people). This is much like tags
used in GitHub to help sort and search through a long list of bug reports
or RFEs.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180922/8933e7bd/attachment.html>

From cldr-users at unicode.org  Sat Sep 22 12:52:23 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Sat, 22 Sep 2018 19:52:23 +0200 (CEST)
Subject: Locale bringup and barriers for entry
Message-ID: <1498279448.5059.1537638744010.JavaMail.www@wwinf2227>

On? 22/09/18 17:23 Philippe Verdy via CLDR-Users wrote:
[quote]
> But the CLDR Survey does not allow us to participate directly by submitting LDML data
> (as submitted but unvetted provisional data) and then merging them in our votes.
> It would considerably speedup the data submission 

Steven R. Loomis hinted this in response to a demand I?d posted on Trac:

https://unicode.org/cldr/trac/ticket/11255#comment:2

> (I also think that such submissions should allow us to include some custom "hashtag",
> that are usable in a search form in CLDR survey, so that we can group related items together:
> we could have multiple hashtags, one for each property we want to track in alternative groups,
> because these groups do not necessarily form a partition of the data space, they are not orthogonal).

I don?t understand how SurveyTool would get these hashtags to work, but you may post this as a feature
request. For newcomers, here is how to send feedback for processing by the CLDR Technical Committee:

1. Set your personal data in Preferences:
https://unicode.org/cldr/trac/prefs

2. Submit any report, new data not having their locale ID in CLDR yet, feature requests:
https://unicode.org/cldr/trac/newticket

Don?t worry if you?re prompted to do some arithmetics. With your personal data in a cookie, that is less likely 
to happen. Make sure however not to exceed the maximum number of 5 external links per post. Internal links
are unlimited, using ? ? ticket:123456 ? ? syntax or alternatives as shown in:
https://unicode.org/cldr/trac/wiki/WikiFormatting

What we can already do is to use XML comments in the files we?re working on, and there we may add hashtags.
However SurveyTool won?t import them, only register our votes. Which is already a huge deal saving us much time.

Regards,

Marcel


From cldr-users at unicode.org  Sat Sep 22 14:53:56 2018
From: cldr-users at unicode.org (Philippe Verdy via CLDR-Users)
Date: Sat, 22 Sep 2018 21:53:56 +0200
Subject: Locale bringup and barriers for entry
In-Reply-To: <1498279448.5059.1537638744010.JavaMail.www@wwinf2227>
References: <1498279448.5059.1537638744010.JavaMail.www@wwinf2227>
Message-ID: <CAGa7JC1jUd0hh4XrQ_ERpOWnmBc0wjsOzd5DPMS5QEPJOP6kiQ@mail.gmail.com>

My intent is to have those tags (that are internally numbered with a stable
id but may be renamed by proposing users, or by admins if tags are made
global and unprefixed) also usable in Survey discussions to attach a
comment to all CLDR data entries using that have been tagged with it by the
user (using a prefixed user tag) or by a global tag (created by the CLDR
tech admins).
I.e. reproduce what we can do easily in GutHub to track various related
bugs reports and RFEs or pending actions. Ideally there should also be a
graph of entries (these tags are working also like tasks in task lists,
they have a status coming from the posted comments, which can be closed
once solved)
May be some integration with GitHub and community development tools or bug
tracking tools would be useful, just like many opensource develomment
projects (of which CLDR is one). That integration comes with URNs/URLs
linked to CLDR data paths.
For now CLDR submission is too much hierarchic, and does not focus very
well on related groups of items that must be fixed together (in the same
locale, or across locales), so it is very difficult to isolate the
inconsistencies (and get reliable votes to fix them in the short time
allowed for submission and vetting (notably because the tool is much too
slow and uses really too much javascript/DOM resources in the browser, and
is very unresponsive to user events, creating many unexpected actions, or
ignored clicks).

Le sam. 22 sept. 2018 ? 19:58, Marcel Schneider via CLDR-Users <
cldr-users at unicode.org> a ?crit :

> On  22/09/18 17:23 Philippe Verdy via CLDR-Users wrote:
> [quote]
> > But the CLDR Survey does not allow us to participate directly by
> submitting LDML data
> > (as submitted but unvetted provisional data) and then merging them in
> our votes.
> > It would considerably speedup the data submission
>
> Steven R. Loomis hinted this in response to a demand I?d posted on Trac:
>
> https://unicode.org/cldr/trac/ticket/11255#comment:2
>
> > (I also think that such submissions should allow us to include some
> custom "hashtag",
> > that are usable in a search form in CLDR survey, so that we can group
> related items together:
> > we could have multiple hashtags, one for each property we want to track
> in alternative groups,
> > because these groups do not necessarily form a partition of the data
> space, they are not orthogonal).
>
> I don?t understand how SurveyTool would get these hashtags to work, but
> you may post this as a feature
> request. For newcomers, here is how to send feedback for processing by the
> CLDR Technical Committee:
>
> 1. Set your personal data in Preferences:
> https://unicode.org/cldr/trac/prefs
>
> 2. Submit any report, new data not having their locale ID in CLDR yet,
> feature requests:
> https://unicode.org/cldr/trac/newticket
>
> Don?t worry if you?re prompted to do some arithmetics. With your personal
> data in a cookie, that is less likely
> to happen. Make sure however not to exceed the maximum number of 5
> external links per post. Internal links
> are unlimited, using     ticket:123456     syntax or alternatives as shown
> in:
> https://unicode.org/cldr/trac/wiki/WikiFormatting
>
> What we can already do is to use XML comments in the files we?re working
> on, and there we may add hashtags.
> However SurveyTool won?t import them, only register our votes. Which is
> already a huge deal saving us much time.
>
> Regards,
>
> Marcel
>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180922/ee55fcde/attachment.html>

From cldr-users at unicode.org  Sun Sep 23 13:29:25 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Sun, 23 Sep 2018 20:29:25 +0200 (CEST)
Subject: Locale bringup and barriers for entry
In-Reply-To: <CAGa7JC1jUd0hh4XrQ_ERpOWnmBc0wjsOzd5DPMS5QEPJOP6kiQ@mail.gmail.com>
References: <1498279448.5059.1537638744010.JavaMail.www@wwinf2227>
 <CAGa7JC1jUd0hh4XrQ_ERpOWnmBc0wjsOzd5DPMS5QEPJOP6kiQ@mail.gmail.com>
Message-ID: <1404261498.5171.1537727365472.JavaMail.www@wwinf2227>

On 22/09/18 21:54 Philippe Verdy wrote:
>
> My intent is to have those tags (that are internally numbered with a stable id but may be renamed by proposing users,
> or by admins if tags are made global and unprefixed) also usable in Survey discussions to attach a comment to all
> CLDR data entries using that have been tagged with it by the user (using a prefixed user tag) or by a global tag
> (created by the CLDR tech admins).

Now I understand and believe it's very useful to prevent what we often observed or did, when vetters come up with a 
whole bunch of items having same issue, and cannot help posting one forum post per item, as there is no other way 
of getting the stuff show up in the information pane when one of these items has focus in SurveyTool. So we happened 
to copy-paste one single message and paste it as many times in the launch-new-thread form as there were items to fix. 
Downstream that triggers of course an avalanche of e-mail alerts from ST, which every vetter would then have to open
one by one, only to read the same message x times. Therefore yes we should really have means of bundling items and 
discuss them together as a batch.

However the issue lies in the process. As long as we vote items one by one in ST instead of preparing our votes in 
LDML format, we will stick with ST features that may or may not be present. And ST is far too less agile, since even
when a patch is available, it isn?t applied until next ST overhaul prior to next vetting round, so that ST keeps 
tampering with peoples? work instead of being fixed over night, to see how it looks next day. CLDR should adapt to 
contributors? way of working, not impose their own rythm, because contributors may have other constraints and limited 
time. 

A pity that the Trac tool is not used enough. Perhaps vetters are not allowed to spend time writing or commenting 
bug reports, or are disallowed to post publicly, given Trac has unrestrained public access, whereas ST fora are 
closed up and can only be checked by people having credentials, so that locale data production is opaque and at 
the antipodes of what is current practice in open source projects. 
There are also technical issues with interoperability of SurveyTool and Trac. While links in Trac work fine, Trac 
refuses to publish more than five external links at once, which heavily impacts usability, given ST fora are 
AFAIK considered external by Trac spam bots.
What bothered me badly is that anchors on ST fora pages don?t work precisely. Instead of scrolling where you copied 
an anchor link, ST forum scrolls elsewhere, so that I ended up always adding the datestamp for use in browser search.
Well that should make for another bug ticket, but I currently cannot do. I hope TC monitoring this list will wish 
to pick this up for fixing.

> I.e. reproduce what we can do easily in GutHub to track various related bugs reports and RFEs or pending actions.
> Ideally there should also be a graph of entries (these tags are working also like tasks in task lists, they have a status
> coming from the posted comments, which can be closed once solved)
> May be some integration with GitHub and community development tools or bug tracking tools would be useful,
> just like many opensource develomment projects (of which CLDR is one).
> That integration comes with URNs/URLs linked to CLDR data paths.
> For now CLDR submission is too much hierarchic, and does not focus very well on related groups of items
> that must be fixed together (in the same locale, or across locales), so it is very difficult to isolate the inconsistencies
> (and get reliable votes to fix them in the short time allowed for submission and vetting (notably because the tool is
> much too slow and uses really too much java-script/DOM resources in the browser, and is very unresponsive to user events,
> creating many unexpected actions, or ignored clicks).

Yes indeed. People of different locales cannot interoperate well, and even between sublocales and root locales there 
is unresponsiveness. See how one French sublocale did not update the group separator. That is symptomatic of missing 
dynamics inside the CLDR community. Then also we do indeed loose time when slowed down, and an "Approve all" button 
?s also missing, for use where most items are OK and only few or none to change votes after. This one has been posted:
https://unicode.org/cldr/trac/ticket/11250#comment:1

For now I can only suggest to work offline and be ready to organize ourselves. I think that if being able to 
intersperse CLDR work sequenes all over the year, as suggested with the bulk upload feature of CLDR we can prepare 
and make up our minds so that we?re ready for the short windows of opportunity we may then use to efficiently discuss 
and share LDML data for everybody to upload his or her votes.


Regards,

Marcel


From cldr-users at unicode.org  Sun Sep 23 14:46:23 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Sun, 23 Sep 2018 21:46:23 +0200 (CEST)
Subject: Locale bringup and barriers for entry
Message-ID: <1200687843.5624.1537731983164.JavaMail.www@wwinf2227>

On 22/09/18 02:37 Steven R. Loomis via CLDR-Users wrote:
[?]
> Ideas:
> - a web app to take in new locale data?

Do you refer to a piece of entirely new software?

Another Idea I just got is to code a standalone program in C(++) to edit LDML files 
by displaying editable charts. 

And I adhere to Philippe?s advice to set up a collaborative platform open non-stop.
That will allow volunteers to be active at their own rhythm without being bound to 
CLDR-internal timing, while TC may show up in a scheduled way and grab data 
at fixed deadlines. There may still be programmed rushes for organizations to 
appoint workforce while getting around the cost of full-time reviewers.

Hope that helps :)

Regards,

Marcel


From cldr-users at unicode.org  Sun Sep 23 19:38:54 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Mon, 24 Sep 2018 02:38:54 +0200 (CEST)
Subject: Locale bringup and barriers for entry
Message-ID: <1889299939.6180.1537749534241.JavaMail.www@wwinf2227>

On 22/09/18 02:37 Steven R. Loomis via CLDR-Users wrote:
[?]
> Ideas:
> - a web app to take in new locale data?

What one might wish to do is coding up an app for Android prompting 
volunteers to input various content targeting those patterns that are 
collected in CLDR, while programmatically converting raw data into 
the data structures that are needed for CLDR. Eg following example 
in the presentation you have shared, the app could prompt to type in 
the months of the year, a set of full dates, abbreviations, and so on.
Sending it all up to the server, where the data is then processed for 
CLDR intake. Ie the abstraction process effort is centralized instead 
of being passed on to the volunteers, sparing them with working 
through the documentation.

> - a web app to debug/explore plurals?

That seems to be an example of what algorithms can do in that sense.
But beware of stepping over into the realm of full-fleshed automated 
translation, which is outside the scope of CLDR.

Regards,

Marcel


From cldr-users at unicode.org  Mon Sep 24 07:00:26 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Mon, 24 Sep 2018 14:00:26 +0200 (CEST)
Subject: Locale bringup and barriers for entry
Message-ID: <2053930414.4704.1537790426380.JavaMail.www@wwinf2227>

Now sadly this discussion seems to be down without having brought up a large diversity 
of points of view, suggestions, desiderata and advice. So I need to conclude that despite 
the appearances, we didn?t aim at using just another occasion to vent about our contributor
experience with CLDR. What we intended through our posting is:

1. Inform new contributors so they won?t be surprised and may wish to develop strategies 
?? beforehand to eventually mitigate adverse effects of the state of the art, though we?re 
?? expecting that all or part of the cited problems will be fixed prior to next survey round.

2. Motivate responsive people to fix all problems in urgency so that no contributors get 
?? discouraged when encountering any of those problems.

3. Contribute to the requested brainstorm.

We hope we?ve done our part to reach these goals, and 
we?d welcome any other effort to enrich the feedback harvest.

Thanks,

Marcel


From cldr-users at unicode.org  Mon Sep 24 11:52:25 2018
From: cldr-users at unicode.org (Steven R. Loomis via CLDR-Users)
Date: Mon, 24 Sep 2018 09:52:25 -0700
Subject: Locale bringup and barriers for entry
In-Reply-To: <695051747.1133.1537604109774.JavaMail.www@wwinf2227>
References: <695051747.1133.1537604109774.JavaMail.www@wwinf2227>
Message-ID: <CAFYQx+CAGB73o3_GkyEpWx8KjbTDQzF=CcsP86Bppi45D9te7Q@mail.gmail.com>

Marcel and Philippe,
 I see some interesting discussion, though some of it was (as noted in
later emails) recapping existing bugs.
 However, please note how I began this discussion:

On Sat, Sep 22, 2018 at 1:15 AM Marcel Schneider via CLDR-Users <
cldr-users at unicode.org> wrote:

> > At the IUC conference last week, a few of us discussed around lunch some
> issues around getting new locales into CLDR, and barriers to entry.
>

The key word here is ?new?-  locales not currently in CLDR.  For example,
Emoji category names are not a part of CLDR minimal data, and also, new
locales will not face issues around inheritance.

1. The English template may not be internally consistent, eg emoji category
> names may be singular or plural (plural throughout seems correct);
> 2. The English template may not be up-to-date, eg. still including ASCII
> quotes in exemplar punctuation though these have been ruled out;
> 3. The target data sets may not be comprehensively specified, eg the
> define of exemplar punctuation does include an exclusion clause for math
>      symbols only, while the clause about not including symbols on a
> programmatic usage basis such as # @ _ is still missing;
> 4. The English template may not be kept in synchrony with the
> specifications, eg emoji keywords not to include emoji name or name starter;
>

There are continuous improvements on the English side data. I don't think
the above are necessarily barriers to initial entry.


> 5. Numerous bugs affecting markup of inherited values (but these have been
> reported and are about to be fixed in the SurveyTool code).
>

Right.


> > http://cldr.unicode.org/index/cldr-spec/minimaldata - especially
> pluralization data
>
> The scope of pluralization seems unclear and biased by the English
> paradigm of genderlessness, while in other languages grammatical gender
> is a determining parameter for pluralization, so that even extensions to
> the DTD seem to be required for providing out-of-the-box pluralization
> rules.
>

I'm not sure what is meant by 'extensions to the DTD'.  In any event, CLDR
pluralization has proven to be largely successful in practice.
Do you have any specific concern about CLDR plurals? Is there a bug filed?


> > - what about fonts?
>


> > keyboards?
>
> I see fonts and keyboards actually as the two missing components of the
> stack that you designed, because though being part of locale data, input
> methods are a precondition of efficient submission of locale data. The
> full stack would thus expand to:
>


> > - what are the best ways to coordinate efforts between the language
> users and different technical experts?
> > Ideas:
> > - a web app to take in new locale data?
>
> I think CLDR has already its web app, ie SurveyTool. A full-time engineer
> is actually redeveloping and debugging several or all parts of it.
>

Again, the scope of this data is data for a completely new locale that is
not currently in CLDR. The idea would be an application just for taking in
data listed at http://cldr.unicode.org/index/cldr-spec/minimaldata


> > - a web app to debug/explore plurals?
>
> Before including this functionality in SurveyTool, where it belongs in, I
> think that the spec should be redesigned, and the documentation updated
> accordingly. That could eventually result in extended language support by
> CLDR/ICU, which would do no harm but only raise the product value.
>

Redesigned how? Again - do you have any specific concern about CLDR
plurals? Is there a bug filed?


> > - allowing some locales to 'get started' without plural rules?
>
> I think that any locale may get started in CLDR when providing date and
> time formats, while correctly displaying a reminder of a shopping cart
> may be left over for a later stage.
>

That's the general idea. (And a good way to put it, as a 'shopping cart'.)
Perhaps any data item that depends on plurals ( currency category, compact
decimal category, etc. ) would be 'locked' until it is unlocked by the
input of plural data.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180924/330c12b6/attachment.html>

From cldr-users at unicode.org  Mon Sep 24 14:51:35 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Mon, 24 Sep 2018 21:51:35 +0200 (CEST)
Subject: Locale bringup and barriers for entry
Message-ID: <474792420.10994.1537818695522.JavaMail.www@wwinf2227>

On 24/09/18 18:52 Steven R. Loomis wrote:
[?]
> I'm not sure what is meant by 'extensions to the DTD'.? In any event, CLDR pluralization has proven to be largely successful in practice.
> Do you have any specific concern about CLDR plurals? Is there a bug filed?
?
I?d filed this bug about French plurals:

https://unicode.org/cldr/trac/ticket/11302
Ordinal minimal pairs for French
?
Although as noted there, most other locales are unaffected.?
I?ve just extrapolated from this that some issues may be awaiting new locales, and that when facing barriers,?
getting them out of the way may require the DTD to be extended, so submitters should be ready to file tickets,?
as we?re often prompted to do by the SurveyTool information panel.
?
[?]
> >?
> > Before including this functionality in SurveyTool, where it belongs in, I think that the spec should be redesigned, and the documentation updated?
> > accordingly. That could eventually result in extended language support by CLDR/ICU, which would do no harm but only raise the product value.
>
> Redesigned how? Again - do you have any specific concern about CLDR plurals? Is there a bug filed?
?
My concern is that CLDR seems not to take gender into account when providing plural rules, but I was told that gender is not inside the scope.
The fact is that nouns may inflect differently depending on whether they are feminine or masculine.
?
> > >? - allowing some locales to 'get started' without plural rules?
> >?
> > I think that any locale may get started in CLDR when providing date and time formats, while correctly displaying a reminder of a shopping cart?
> > may be left over for a later stage.
>
> That's the general idea. (And a good way to put it, as a 'shopping cart'.)?
?
The idea isn?t mine. Here is the documentation locus where I got it from:
?
http://cldr.unicode.org/index/cldr-spec/plural-rules#TOC-Non-inflecting-Nouns-Pronouns
?
> Perhaps any data item that depends on plurals ( currency category, compact decimal category, etc. )
> would be 'locked' until it is unlocked by the input of plural data.
?
Provided that ?locking? an item won?t cause a blank or another sort of bug.?
When a user sees an item not pluralized where it is expected to be plural,?
then simply inferring that pluralization isn?t ready might be straightforward.
There will surely be some IF in the code to prevent the app from crashing.
?
Glad that the discussion has restarted. Perhaps I was too impatient.
?
Regards,
?
?
Marcel
?


From cldr-users at unicode.org  Mon Sep 24 15:18:26 2018
From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users)
Date: Mon, 24 Sep 2018 13:18:26 -0700
Subject: Locale bringup and barriers for entry
In-Reply-To: <474792420.10994.1537818695522.JavaMail.www@wwinf2227>
References: <474792420.10994.1537818695522.JavaMail.www@wwinf2227>
Message-ID: <CAJ2xs_F0yH9hdU9yiMb9LOic1jd2qDQF2aAhGvXyZjWUihaF0A@mail.gmail.com>

Mark


On Mon, Sep 24, 2018 at 12:52 PM Marcel Schneider via CLDR-Users <
cldr-users at unicode.org> wrote:

> On 24/09/18 18:52 Steven R. Loomis wrote:
> [?]
> > I'm not sure what is meant by 'extensions to the DTD'.  In any event,
> CLDR pluralization has proven to be largely successful in practice.
> > Do you have any specific concern about CLDR plurals? Is there a bug
> filed?
>
> I?d filed this bug about French plurals:
>
> https://unicode.org/cldr/trac/ticket/11302
> Ordinal minimal pairs for French
>
> Although as noted there, most other locales are unaffected.
> I?ve just extrapolated from this that some issues may be awaiting new
> locales, and that when facing barriers,
> getting them out of the way may require the DTD to be extended, so
> submitters should be ready to file tickets,
> as we?re often prompted to do by the SurveyTool information panel.
>
> [?]
> > >
> > > Before including this functionality in SurveyTool, where it belongs
> in, I think that the spec should be redesigned, and the documentation
> updated
> > > accordingly. That could eventually result in extended language support
> by CLDR/ICU, which would do no harm but only raise the product value.
> >
> > Redesigned how? Again - do you have any specific concern about CLDR
> plurals? Is there a bug filed?
>
> My concern is that CLDR seems not to take gender into account when
> providing plural rules, but I was told that gender is not inside the scope.
> The fact is that nouns may inflect differently depending on whether they
> are feminine or masculine.
>

The focus for plurals in CLDR is "what would change if I change a number to
another number in a placeholder". So if I have a message with a masculine
noun, I have two versions:

one: "{number} libro ? selezionato"
other: "{number} libri sono selezionati"

vs also 2 versions with a feminine noun.

one: "{number} nota ? selezionata"
other: "{number} note ? selezionato

Now, there are some languages (eg Russian) that only exhibit differences
for one of the plural categories if there is certain gender involved. So
the plural categories themselves need to be the maximal partition across
the possible genders, cases, and other features.

What is NOT in scope for CLDR at this time is to both change gender and
number. Typically that requires many other changes in the rest of the text.

one: "{number} {thing} ? selezionata"
...

ICU has a mechanism for doing a SELECT using gender, but there the gender
has to be supplied as a parameter, and a sub-message supplied for each of
the (say) 3 genders x 4 plural-categories.

Actually detecting the gender of nouns and modifying sentences on that
basis is out of scope (and a very tricky problem in general).


> > > >  - allowing some locales to 'get started' without plural rules?
> > >
> > > I think that any locale may get started in CLDR when providing date
> and time formats, while correctly displaying a reminder of a shopping cart
> > > may be left over for a later stage.
> >
> > That's the general idea. (And a good way to put it, as a 'shopping
> cart'.)
>
> The idea isn?t mine. Here is the documentation locus where I got it from:
>
>
> http://cldr.unicode.org/index/cldr-spec/plural-rules#TOC-Non-inflecting-Nouns-Pronouns
>
> > Perhaps any data item that depends on plurals ( currency category,
> compact decimal category, etc. )
> > would be 'locked' until it is unlocked by the input of plural data.
>
> Provided that ?locking? an item won?t cause a blank or another sort of
> bug.
> When a user sees an item not pluralized where it is expected to be plural,
> then simply inferring that pluralization isn?t ready might be
> straightforward.
> There will surely be some IF in the code to prevent the app from crashing.
>

What we have considered (there is a ticket for this somewhere) is
disallowing any data/votes to be entered in a row with a "count" or
"ordinal" attribute until the rules (resp. plural or ordinal) are supplied.
The row would either be grayed out or just omitted.

So data could be entered in the locale for other fields, but the locale
couldn't reach moderate or modern coverage without the rules. So
applications not requiring that coverage level could include the locale,
but those requiring that coverage level would omit it.

>
> Glad that the discussion has restarted. Perhaps I was too impatient.
>
> Regards,
>
>
> Marcel
>
>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180924/7489c6c7/attachment.html>

From cldr-users at unicode.org  Tue Sep 25 00:54:25 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Tue, 25 Sep 2018 07:54:25 +0200 (CEST)
Subject: Locale bringup and barriers for entry
In-Reply-To: <CAJ2xs_F0yH9hdU9yiMb9LOic1jd2qDQF2aAhGvXyZjWUihaF0A@mail.gmail.com>
References: <474792420.10994.1537818695522.JavaMail.www@wwinf2227>
 <CAJ2xs_F0yH9hdU9yiMb9LOic1jd2qDQF2aAhGvXyZjWUihaF0A@mail.gmail.com>
Message-ID: <329976032.379.1537854865588.JavaMail.www@wwinf2227>

On 24/09/18 22:18 Mark Davis ?? wrote:
[quote]
>
> The focus for plurals in CLDR is "what would change if I change a number to another number in a placeholder".
> So if I have a message with a masculine noun, I have two versions:
>
> one: "{number} libro ? selezionato"
> other: "{number} libri sono selezionati"
>
> vs also 2 versions with a feminine noun.
>
> one: "{number} nota ? selezionata"
> other: "{number} note ? selezionato

I?m turning out unable to retrieve plural rules in the LDML tree, except some plural and ordinal minimal pairs.

Also the actual DTD does not seem to contain what is found in the LDML spec at:

https://www.unicode.org/reports/tr35/tr35-numbers.html#Language_Plural_Rules

The DTD only has:

<!ELEMENT minimalPairs ( alias | ( pluralMinimalPairs*, ordinalMinimalPairs*, special* ) ) >
<!ATTLIST minimalPairs alt NMTOKENS #IMPLIED >
<!ATTLIST minimalPairs draft (approved | contributed | provisional | unconfirmed) #IMPLIED >

<!ELEMENT pluralMinimalPairs ( #PCDATA ) >
<!ATTLIST pluralMinimalPairs count NMTOKEN #IMPLIED >
<!ATTLIST pluralMinimalPairs alt NMTOKENS #IMPLIED >
<!ATTLIST pluralMinimalPairs draft (approved | contributed | provisional | unconfirmed) #IMPLIED >

That tends at conjecturing that plural support is still partly under construction,
hence probably the stress put upon it in Steven?s posting.

Consistently, at locale level, eg for Italian, common/main/it.xml only has:

<minimalPairs>
<pluralMinimalPairs count="one">{0} giorno</pluralMinimalPairs>
<pluralMinimalPairs count="other">{0} giorni</pluralMinimalPairs>
<ordinalMinimalPairs ordinal="many">Prendi l?{0}? a destra.</ordinalMinimalPairs>
<ordinalMinimalPairs ordinal="other">Prendi la {0}? a destra.</ordinalMinimalPairs>
</minimalPairs>

That is what I meant when complaining about gender support.
Following your exemplar data, we should have additional data, and I can see no structure 
to accomodate additional forms:

<pluralMinimalPairs count="one">{0} libro ? selezionato</pluralMinimalPairs>
<pluralMinimalPairs count="other">{0} libri sono selezionati</pluralMinimalPairs>
<pluralMinimalPairs count="one">{0} nota ? selezionata</pluralMinimalPairs>
<pluralMinimalPairs count="other">{0} note sono selezionate</pluralMinimalPairs>

The apparent redundancy induced might be disambiguated by adding a gender attribute:

<pluralMinimalPairs gender="masculine" count="one">{0} libro ? selezionato</pluralMinimalPairs>
<pluralMinimalPairs gender="masculine" count="other">{0} libri sono selezionati</pluralMinimalPairs>
<pluralMinimalPairs gender="feminine" count="one">{0} nota ? selezionata</pluralMinimalPairs>
<pluralMinimalPairs gender="feminine" count="other">{0} note sono selezionate</pluralMinimalPairs>

The case is also striking when considering ordinal minimal pairs. 
To start, I can find no clear definition of what "few" and "many" are to represent.
Hence I?m unable to make sense of the following, although that may result from my incompetence in 
Italian, and not using Google Translate right now to enlighten me (although I heavily used it elsewhere):

<ordinalMinimalPairs ordinal="many">Prendi l?{0}? a destra.</ordinalMinimalPairs>

When making a case for gender here, taking something like "via" for feminine, and "camino" for 
masculine, and "prima"/"primo" for "one" vs "terzia"/"terzio" for "other", the data above would 
IMO expand to:

<ordinalMinimalPairs gender="feminine" ordinal="one">Prendi la {0}? a destra.</ordinalMinimalPairs>
<ordinalMinimalPairs gender="feminine" ordinal="many">???</ordinalMinimalPairs>
<ordinalMinimalPairs gender="feminine" ordinal="other">Prendi la {0}? a destra.</ordinalMinimalPairs>
<ordinalMinimalPairs gender="masculine" ordinal="one">Prendi il {0}? a destra.</ordinalMinimalPairs>
<ordinalMinimalPairs gender="masculine" ordinal="many">???</ordinalMinimalPairs>
<ordinalMinimalPairs gender="masculine" ordinal="other">Prendi il {0}? a destra.</ordinalMinimalPairs>

Assuming that "many" stands for "8" ? which should be defined somewhere ? and collapsing redundant 
defines, the result would be akin to the original data (although with proper ordinal indicators):

<ordinalMinimalPairs gender="feminine" ordinal="many">Prendi l?{0}? a destra.</ordinalMinimalPairs>
<ordinalMinimalPairs gender="feminine" ordinal="other">Prendi la {0}? a destra.</ordinalMinimalPairs>
<ordinalMinimalPairs gender="masculine" ordinal="many">Prendi l?{0}? a destra.</ordinalMinimalPairs>
<ordinalMinimalPairs gender="masculine" ordinal="other">Prendi il {0}? a destra.</ordinalMinimalPairs>

Perhaps ticket #11393 is related to this topic.


> Now, there are some languages (eg Russian) that only exhibit differences
> for one of the plural categories if there is certain gender involved.
> So the plural categories themselves need to be the maximal partition
> across the possible genders, cases, and other features.

Perhaps I?m silly, still I?m unable to figure out how "minimal pairs" can represent "maximal partition".

> What is NOT in scope for CLDR at this time is to both change gender and number.
> Typically that requires many other changes in the rest of the text.

What I mean is not that CLDR should show the way of transforming content across gender.
What I mean is that CLDR should provide support for both feminine/masculine and masculine/feminine 
patterns. Actually gender support seems to be limited to what English examples suggest as a translation,
be it masculine when "day" translates to "giorno", or feminine when "street" translates to "via".
That is what I think is insufficient.


> one: "{number} {thing} ? selezionata"
> ...
> ?
> ICU has a mechanism for doing a SELECT using gender, but there the gender has to be supplied
> as a parameter, and a sub-message supplied for each of the (say) 3 genders x 4 plural-categories.
>
> Actually detecting the gender of nouns and modifying sentences on that basis is out of scope
> (and a very tricky problem in general).

That seems OK to me as long as CLDR actually helps developers with data for any case they may 
encounter when setting up the values. Else they may wish to just look up a dictionary and a grammar 
of the target locale to find out by themselves what are the cases they have to consider.

[quote]
> > > Perhaps any data item that depends on plurals ( currency category, compact decimal category, etc. )
> > > would be 'locked' until it is unlocked by the input of plural data.
> > ?
> > Provided that ?locking? an item won?t cause a blank or another sort of bug.?
> > When a user sees an item not pluralized where it is expected to be plural,?
> > then simply inferring that pluralization isn?t ready might be straightforward.
> > There will surely be some IF in the code to prevent the app from crashing.
>
> What we have considered (there is a ticket for this somewhere) is disallowing any data/votes
> to be entered in a row with a "count" or "ordinal" attribute until the rules (resp. plural or ordinal)
> are supplied. The row would either be grayed out or just omitted.
> So data could be entered in the locale for other fields, but the locale couldn't reach moderate
> or modern coverage without the rules. So applications not requiring that coverage level could
> include the locale, but those requiring that coverage level would omit it.

Sorry, I misunderstood the scope. Thanks for explaining.

Perhaps the ticket may be #11061


Indeed that makes for clean data and ensures reliability of CLDR.
If so many plural rule data are missing that CLDR must make a special case for it, 
that may result from the difficulties that non-expert vetters like me are experiencing 
with the topic. Now that CLDR plural rules are reported to work well in practice, I?m 
wondering about how all that interconnects. Eg obviously some rules are working well, 
especially when matching some frequent uses cases. But the point as I can see it is 
whether CLDR is covering *all* use cases, eventually except very rare ones.


Thanks.

Regards,

Marcel


From cldr-users at unicode.org  Tue Sep 25 03:00:40 2018
From: cldr-users at unicode.org (Philippe Verdy via CLDR-Users)
Date: Tue, 25 Sep 2018 10:00:40 +0200
Subject: Locale bringup and barriers for entry
In-Reply-To: <329976032.379.1537854865588.JavaMail.www@wwinf2227>
References: <474792420.10994.1537818695522.JavaMail.www@wwinf2227>
 <CAJ2xs_F0yH9hdU9yiMb9LOic1jd2qDQF2aAhGvXyZjWUihaF0A@mail.gmail.com>
 <329976032.379.1537854865588.JavaMail.www@wwinf2227>
Message-ID: <CAGa7JC2voEr0L3nZRLppD03cfGKGrJdHPGej-cwnc_3meFsA8A@mail.gmail.com>

The numeric cases tagged as "one", "few", "many", "other" are defined in
CLDR in plural rules for each locale. When a message is not translated in a
given language and another message is taken from a fallback, the plural
rules defined for that fallback must then be used instead of the plural
rules for the initial target locale.

Plural rules are documented. These are defined as minimal data needed to
start any new locale. and note that the "other" rule is used as a fallback
if a locale does not define any message for a specific plural form, so
before looking of for fallback languages, the messages are first looking
for a translation in the "other" plural rule in the target locale.

Once a new locale is being setup, the CLDR survey will ask for translations
for each plural form where needed (when a message to translate has a
placeholder for a variable number), but note that a given message cannot be
tagged like this if it contains several placeholders with different numeric
values: if this happens, it will have to be splitted in several parts and
the parts will be assembled in another message containing pleholders for
each part (this would also be needed if there were multiple genders or
grammatic cases to handle in the same assembled message).

Le mar. 25 sept. 2018 ? 07:58, Marcel Schneider via CLDR-Users <
cldr-users at unicode.org> a ?crit :

> To start, I can find no clear definition of what "few" and "many" are to
> represent.
> Hence I?m unable to make sense of the following, although that may result
> from my incompetence in
> Italian, and not using Google Translate right now to enlighten me
> (although I heavily used it elsewhere):
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180925/1d84bcad/attachment-0001.html>

From cldr-users at unicode.org  Tue Sep 25 04:32:30 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Tue, 25 Sep 2018 11:32:30 +0200 (CEST)
Subject: Locale bringup and barriers for entry
In-Reply-To: <CAGa7JC2voEr0L3nZRLppD03cfGKGrJdHPGej-cwnc_3meFsA8A@mail.gmail.com>
References: <474792420.10994.1537818695522.JavaMail.www@wwinf2227>
 <CAJ2xs_F0yH9hdU9yiMb9LOic1jd2qDQF2aAhGvXyZjWUihaF0A@mail.gmail.com>
 <329976032.379.1537854865588.JavaMail.www@wwinf2227>
 <CAGa7JC2voEr0L3nZRLppD03cfGKGrJdHPGej-cwnc_3meFsA8A@mail.gmail.com>
Message-ID: <312813575.3078.1537867951491.JavaMail.www@wwinf2227>

On 25/09/18 10:00 Philippe Verdy wrote:
> 
> The numeric cases tagged as "one", "few", "many", "other" are defined in CLDR in plural rules for each locale.

Italian happens to use it while it isn?t defined in main/it.xml. On the other hand, main/en.xml doesn?t define it neither,
but doesn?t use it, although English could use a case for "eight" as documented in:

https://www.unicode.org/reports/tr35/tr35-numbers.html#Language_Plural_Rules

But it is considered an edge case:

??There is an edge case in English because of the behavior of "a/an".
For example, in changing from 3 to 8:
"a 3rd of a loaf" should result in "an 8th of a loaf", not "a 8th of a loaf"
"a 3 foot stick" should result in "an 8 foot stick", not "a 8 foot stick"
So numbers of the following forms could have a special plural category and special ordinal category: 8(X), 11(X), 18(X), 8x(X), where x is 0..9 and the optional X is 00, 000, 
00000, and so on.
On the other hand, the above constructions are relatively rare in messages constructed using numeric placeholders, so the disruption for implementations currently using CLDR 
plural categories wouldn't be worth the small gain.??

I don?t agree with the conclusion, given displaying messages like ?Do you wish a 8 foot stick?? would 
reflect badly on the corporate image of the retailer using a poorly implemented user interface.

> When a message is not translated in a given language and another message is taken from a fallback,
> the plural rules defined for that fallback must then be used instead of the plural rules for the initial target locale.

Agreed, but having untranslated values in a locale is not making that locale particularly well supported in CLDR.

> Plural rules are documented. These are defined as minimal data needed to start any new locale.

That seems to be one of those barriers that Steven is now questioning, or even the main barrier for entry.
For me that would remain a barrier as long as I cannot get clear insight nor see straightforward structures to fill in.

> and note that the "other" rule is used as a fallback if a locale does not define any message for a specific plural form,
> so before looking of for fallback languages, the messages are first looking for a translation in the "other" plural rule in the target locale.

In those cases, implementations may use generic display such as ?Your cart ({0})? where {0} is the number of items it contains, much like 
in a mailbox the number of new messages in a folder.

> Once a new locale is being setup, the CLDR survey will ask for translations for each plural form where needed
> (when a message to translate has a placeholder for a variable number), but note that a given message cannot
> be tagged like this if it contains several placeholders with different numeric values: if this happens, it will have
> to be splitted in several parts and the parts will be assembled in another message containing pleholders for each part
> (this would also be needed if there were multiple genders or grammatic cases to handle in the same assembled message).

Got it, thanks. That doesn?t resolve however what I meant when complaining that CLDR does not provide comprehensive 
support for inflected forms. IMO it would be more useful to note that Italian nouns ending in -o must have that -o changed 
to -i when pluralized, and those ending in -a must have the -a replaced with -e. But that only encompasses regular inflection. 
I end up thinking that there is no point for CLDR in providing inflected forms. Wouldn?t it suffice to indicate which numbers 
require plural and which category? 
For support of abbreviated ordinals, CLDR could simply list all ways of constructing an ordinal abbreviation, and relate them 
to number and to gender. It isn?t clear to me how a GPS message could make it into CLDR. I think that one should stick with
the way things are done for date and time.

Regards,

Marcel


From cldr-users at unicode.org  Tue Sep 25 06:02:55 2018
From: cldr-users at unicode.org (Philippe Verdy via CLDR-Users)
Date: Tue, 25 Sep 2018 13:02:55 +0200
Subject: Locale bringup and barriers for entry
In-Reply-To: <312813575.3078.1537867951491.JavaMail.www@wwinf2227>
References: <474792420.10994.1537818695522.JavaMail.www@wwinf2227>
 <CAJ2xs_F0yH9hdU9yiMb9LOic1jd2qDQF2aAhGvXyZjWUihaF0A@mail.gmail.com>
 <329976032.379.1537854865588.JavaMail.www@wwinf2227>
 <CAGa7JC2voEr0L3nZRLppD03cfGKGrJdHPGej-cwnc_3meFsA8A@mail.gmail.com>
 <312813575.3078.1537867951491.JavaMail.www@wwinf2227>
Message-ID: <CAGa7JC02W9wV3NJV75fOJ8qAWS=z4kRm6EnXFf-8kvTQmROKkA@mail.gmail.com>

Le mar. 25 sept. 2018 ? 11:32, Marcel Schneider <charupdate at orange.fr> a
?crit :

> On 25/09/18 10:00 Philippe Verdy wrote:
> > Plural rules are documented. These are defined as minimal data needed to
> start any new locale.
>
> That seems to be one of those barriers that Steven is now questioning, or
> even the main barrier for entry.
> For me that would remain a barrier as long as I cannot get clear insight
> nor see straightforward structures to fill in.
>
> See the documentation:
http://cldr.unicode.org/index/cldr-spec/plural-rules

And the supplemental data which gives a list per locale:
http://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180925/d1bdf3e9/attachment.html>

From cldr-users at unicode.org  Tue Sep 25 06:20:55 2018
From: cldr-users at unicode.org (Philippe Verdy via CLDR-Users)
Date: Tue, 25 Sep 2018 13:20:55 +0200
Subject: Locale bringup and barriers for entry
In-Reply-To: <CAGa7JC02W9wV3NJV75fOJ8qAWS=z4kRm6EnXFf-8kvTQmROKkA@mail.gmail.com>
References: <474792420.10994.1537818695522.JavaMail.www@wwinf2227>
 <CAJ2xs_F0yH9hdU9yiMb9LOic1jd2qDQF2aAhGvXyZjWUihaF0A@mail.gmail.com>
 <329976032.379.1537854865588.JavaMail.www@wwinf2227>
 <CAGa7JC2voEr0L3nZRLppD03cfGKGrJdHPGej-cwnc_3meFsA8A@mail.gmail.com>
 <312813575.3078.1537867951491.JavaMail.www@wwinf2227>
 <CAGa7JC02W9wV3NJV75fOJ8qAWS=z4kRm6EnXFf-8kvTQmROKkA@mail.gmail.com>
Message-ID: <CAGa7JC2FMNeQq3Jr3Wzths3PxbGUMEwHK-m8QN8YdvQkLwTtgg@mail.gmail.com>

Note that the supplemental data is OK for the "cardinal" and "range" type
of categories, but largely failing almost everywhere for the "ordinal" type.
E.g. in French: "Prenez la 1re ? droite" (this assumes the feminine gender,
which is ok for "rue"="street", "avenue", or "sortie"="exit", but wrong for
"feu"="trafic light" or "stop" which are masculine, as in "Tournez au 1er
feu ? droite", where "1er" and "1re" change depending on the gender of the
explicit or implicit noun)

Yes ordinals (but also fractions) need derivation by gender (as well as
grammatical case) including for abbreviated forms (e.g. in French, Italian,
Spanish, but even in English with inflected leading articles like "a" vs.
"an", which depends on the numeric value of the ordinal).

And I see little use of these "ordinal" types except in strict isolation
assuming a nominal use (outside of real sentences where they will be
inserted) without any relation with the noun (or nominal group) to which
they refer (note: this noun or nominal group may be outside the curent
isolated "paragraph", such as a column heading, or other info such as
resulting ranks in sportive competition for women, vs. the same table for
men.

Basically this means that CLDR just provides baic data that still needs to
be tuned and localized again for specific applications, even if this tuning
is generic. What CLDR can do however is to monitor if there are stable
applications desiring to interchange their localized data containign gender
or case differences: if their localisation data is large enough to cover
enough locales for a significant part of the world and theyr want to
interoperate, they will create a defacto standard that can be integrated
(after being proposed to CLDR with enough examplar data and open licencing).

Such applications already exist (notably across wikis, ven if this still
requires much work to have them cooperate together to stabilize some issues
and agree to some common formats, and efficicently track the translations
problems remaining and how to manage the remaining incoherences, as well as
accepting some deviations for specific uses in more specific pages they
don't want to break).


Le mar. 25 sept. 2018 ? 13:02, Philippe Verdy <verdy_p at wanadoo.fr> a ?crit :

>
>
> Le mar. 25 sept. 2018 ? 11:32, Marcel Schneider <charupdate at orange.fr> a
> ?crit :
>
>> On 25/09/18 10:00 Philippe Verdy wrote:
>> > Plural rules are documented. These are defined as minimal data needed
>> to start any new locale.
>>
>> That seems to be one of those barriers that Steven is now questioning, or
>> even the main barrier for entry.
>> For me that would remain a barrier as long as I cannot get clear insight
>> nor see straightforward structures to fill in.
>>
>> See the documentation:
> http://cldr.unicode.org/index/cldr-spec/plural-rules
>
> And the supplemental data which gives a list per locale:
>
> http://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180925/67e7f102/attachment.html>

From cldr-users at unicode.org  Tue Sep 25 14:11:49 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Tue, 25 Sep 2018 21:11:49 +0200 (CEST)
Subject: Locale bringup and barriers for entry
In-Reply-To: <CAGa7JC2FMNeQq3Jr3Wzths3PxbGUMEwHK-m8QN8YdvQkLwTtgg@mail.gmail.com>
References: <474792420.10994.1537818695522.JavaMail.www@wwinf2227>
 <CAJ2xs_F0yH9hdU9yiMb9LOic1jd2qDQF2aAhGvXyZjWUihaF0A@mail.gmail.com>
 <329976032.379.1537854865588.JavaMail.www@wwinf2227>
 <CAGa7JC2voEr0L3nZRLppD03cfGKGrJdHPGej-cwnc_3meFsA8A@mail.gmail.com>
 <312813575.3078.1537867951491.JavaMail.www@wwinf2227>
 <CAGa7JC02W9wV3NJV75fOJ8qAWS=z4kRm6EnXFf-8kvTQmROKkA@mail.gmail.com>
 <CAGa7JC2FMNeQq3Jr3Wzths3PxbGUMEwHK-m8QN8YdvQkLwTtgg@mail.gmail.com>
Message-ID: <1873582906.6218.1537902709278.JavaMail.www@wwinf2227>

Thanks for the links to documentation. The first page:

http://cldr.unicode.org/index/cldr-spec/plural-rules

contains new instructions stating that gender is irrelevant except if 
two nouns of different gender are needed to cover all plural categories.

This results in replacing ?Prenez la {0}re ? droite; Prenez le {0}er ? droite?
with a sentence like you suggested: ?Prenez au {0}er feu ? droite puis la {0}re ? droite?

Still I don?t understand why information is to be packed into arbitrary phrases instead 
of being stored in a more formal way, using appropriate data structures differentiating 
the values by transparent criteria, like what is already done for number with data 
stored in the supplemental/ directory:

https://www.unicode.org/repos/cldr/tags/latest/common/supplemental/plurals.xml
https://www.unicode.org/repos/cldr/tags/latest/common/supplemental/ordinals.xml

which is what I looked for.

Values like "few" and "many" seem to be used as convenient labels to get more categories.
Eg Gujarati has "two" for 2 and 3, "few" for 4, and "many" for 6, while 5 and 7 upwards
are "other". Understandably "many" is used for Italian to label the category dedicated 
to numbers starting with a vowel.


The supplemental/ folder contains many things, among which I stumbled over 
attributeValueValidity.xml. The 2?? through 4?? comment in this file are 
contradicting the very subject of this thread, so I suggest to remove these
PRIOR to the v34 release?

Regards,

Marcel

On 25/09/18 13:21 Philippe Verdy wrote:

>
Note that the supplemental data is OK for the "cardinal" and "range" type of categories, but largely failing almost everywhere for the "ordinal" type.
E.g. in French: "Prenez la 1re ? droite" (this assumes the feminine gender, which is ok for "rue"="street", "avenue", or "sortie"="exit", but wrong for "feu"="trafic light" or "stop" 
which are masculine, as in "Tournez au 1er feu ? droite", where "1er" and "1re" change depending on the gender of the explicit or implicit noun)

>
Yes ordinals (but also fractions) need derivation by gender (as well as grammatical case) including for abbreviated forms (e.g. in French, Italian, Spanish, but even in English 
with inflected leading articles like "a" vs. "an", which depends on the numeric value of the ordinal).

>
And I see little use of these "ordinal" types except in strict isolation assuming a nominal use (outside of real sentences where they will be inserted) without any relation with the 
noun (or nominal group) to which they refer (note: this noun or nominal group may be outside the curent isolated "paragraph", such as a column heading, or other info such as 
resulting ranks in sportive competition for women, vs. the same table for men.

>
Basically this means that CLDR just provides baic data that still needs to be tuned and localized again for specific applications, even if this tuning is generic. What CLDR can do 
however is to monitor if there are stable applications desiring to interchange their localized data containign gender or case differences: if their localisation data is large enough to 
cover enough locales for a significant part of the world and theyr want to interoperate, they will create a defacto standard that can be integrated (after being proposed to CLDR 
with enough examplar data and open licencing).

>
Such applications already exist (notably across wikis, ven if this still requires much work to have them cooperate together to stabilize some issues and agree to some common 
formats, and efficicently track the translations problems remaining and how to manage the remaining incoherences, as well as accepting some deviations for specific uses in 
more specific pages they don't want to break).

>
Le?mar. 25 sept. 2018 ??13:02, Philippe Verdy  a ?crit?:
>
> 
>
Le?mar. 25 sept. 2018 ??11:32, Marcel Schneider  a ?crit?:
>
On 25/09/18 10:00 Philippe Verdy wrote:
> > Plural rules are documented. These are defined as minimal data needed to start any new locale.
> 
> That seems to be one of those barriers that Steven is now questioning, or even the main barrier for entry.
> For me that would remain a barrier as long as I cannot get clear insight nor see straightforward structures to fill in.
> 
>
See the documentation:
http://cldr.unicode.org/index/cldr-spec/plural-rules

>
And the supplemental data which gives a list per locale:
http://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html
>

>


From cldr-users at unicode.org  Tue Sep 25 16:27:10 2018
From: cldr-users at unicode.org (Luke Dashjr via CLDR-Users)
Date: Tue, 25 Sep 2018 21:27:10 +0000
Subject: Locale bringup and barriers for entry
In-Reply-To: <CAFYQx+AkdwP_piqWDd4k4VAaOq7jneXDLpGVfu8scOELjedNyA@mail.gmail.com>
References: <CAFYQx+AkdwP_piqWDd4k4VAaOq7jneXDLpGVfu8scOELjedNyA@mail.gmail.com>
Message-ID: <201809252127.11342.luke@dashjr.org>

It's been a while since I tried, but I didn't see any possible way to define a 
locale's number system (eg, octal or tonal instead of decimal).

On Saturday 22 September 2018 00:34:27 Steven R. Loomis via CLDR-Users wrote:
> Hello, and welcome to the new cldr-users members.
>
> For discussion:
>
> At the IUC conference last week, a few of us discussed around lunch some
> issues around getting new locales into CLDR, and barriers to entry.
>
> Barriers:
> - we discussed that it could be confusing or difficult to collect all of
> the data needed for a minimal locale:
> http://cldr.unicode.org/index/cldr-spec/minimaldata - especially
> pluralization data
> - what about fonts? keyboards?
> - what are the best ways to coordinate efforts between the language users
> and different technical experts?
>
> Ideas:
> - a web app to take in new locale data?
> - a web app to debug/explore plurals?
> - allowing some locales to 'get started' without plural rules?
>
> Links for discussion:
> - Elnaz and Steven's prez from (last) Monday: https://goo.gl/sN7biw
> - My "full stack" blog post:
> https://srl295.github.io/2017/06/06/full-stack-enablement/


From cldr-users at unicode.org  Tue Sep 25 16:48:33 2018
From: cldr-users at unicode.org (Steven R. Loomis via CLDR-Users)
Date: Tue, 25 Sep 2018 14:48:33 -0700
Subject: Locale bringup and barriers for entry
In-Reply-To: <201809252127.11342.luke@dashjr.org>
References: <CAFYQx+AkdwP_piqWDd4k4VAaOq7jneXDLpGVfu8scOELjedNyA@mail.gmail.com>
 <201809252127.11342.luke@dashjr.org>
Message-ID: <CAFYQx+A2aW8DKOhfrDgJ5vM_+6rsW7k38gDCmrBcL5bviDN0ug@mail.gmail.com>

The numbering system is defined in TR 35 in
https://unicode.org/reports/tr35/tr35-numbers.html#Numbering_Systems in
terms of either '*numeric*' (which are decimal systems, just substituting
different digits for "0123456789", such as ????????????  for the Vai
language, or else *algorithmic* which are more complex rule based. I
suppose octal and tonal (hexadecimal?!) could be supported by the
algorithmic approach.


On Tue, Sep 25, 2018 at 2:27 PM Luke Dashjr <luke at dashjr.org> wrote:

> It's been a while since I tried, but I didn't see any possible way to
> define a
> locale's number system (eg, octal or tonal instead of decimal).
>
> On Saturday 22 September 2018 00:34:27 Steven R. Loomis via CLDR-Users
> wrote:
> > Hello, and welcome to the new cldr-users members.
> >
> > For discussion:
> >
> > At the IUC conference last week, a few of us discussed around lunch some
> > issues around getting new locales into CLDR, and barriers to entry.
> >
> > Barriers:
> > - we discussed that it could be confusing or difficult to collect all of
> > the data needed for a minimal locale:
> > http://cldr.unicode.org/index/cldr-spec/minimaldata - especially
> > pluralization data
> > - what about fonts? keyboards?
> > - what are the best ways to coordinate efforts between the language users
> > and different technical experts?
> >
> > Ideas:
> > - a web app to take in new locale data?
> > - a web app to debug/explore plurals?
> > - allowing some locales to 'get started' without plural rules?
> >
> > Links for discussion:
> > - Elnaz and Steven's prez from (last) Monday: https://goo.gl/sN7biw
> > - My "full stack" blog post:
> > https://srl295.github.io/2017/06/06/full-stack-enablement/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180925/7c4ebabd/attachment.html>

From cldr-users at unicode.org  Tue Sep 25 17:38:25 2018
From: cldr-users at unicode.org (Philippe Verdy via CLDR-Users)
Date: Wed, 26 Sep 2018 00:38:25 +0200
Subject: Locale bringup and barriers for entry
In-Reply-To: <CAFYQx+A2aW8DKOhfrDgJ5vM_+6rsW7k38gDCmrBcL5bviDN0ug@mail.gmail.com>
References: <CAFYQx+AkdwP_piqWDd4k4VAaOq7jneXDLpGVfu8scOELjedNyA@mail.gmail.com>
 <201809252127.11342.luke@dashjr.org>
 <CAFYQx+A2aW8DKOhfrDgJ5vM_+6rsW7k38gDCmrBcL5bviDN0ug@mail.gmail.com>
Message-ID: <CAGa7JC3gu2Du8f-eb05ORC4xpuafbby2y6bu3H5GRK9HXmwhhQ@mail.gmail.com>

octal and hexadecimal (as well as binary) are obviously numeric system
using the same digits (or borrowing additional letters or adding other
supplemental digits): the algorithm behind is the same as decimal, it's
just using a different base (not necessarily wrriten each time but infered
from the context), and that algorithm is equally simple, it's basic
arithmetic expressed over a cyclic group. That numeric notation is
contradicted by the way nbumbers are actually spelled in actual languages,
where the base is obviously not just decimal but is using larger bases
(most often 1000 in European traditions, but 100 or 10000 in parts of Asia,
with various exeptions using remainining traces of base 20). Historically,
numbers had mystic or religious traditions, and there remains some old
systems using base 12 (including the old English and Celtic traditions).

Octal and heaxdecimal are certainly modern inventions for technical reasons
(or limitations for and older state-of-the-art technology and costs of
implementations when pure binary system was simply unusable for most
usages; usage of octal is now deprecated, largely replaced by
hexadecimal... except in wellknown programming languages and in old
technical documentations for the oldest computing standards that were never
really deprecated completely to become really out of use or because of
compatibility issues: its support is still mandatory as its also impacts
how these programming languages are parsed into unbreakable lexical tokens:
it would be unpractical to change this basic tokenisation algorithm on
which the rest of the language is built, but a contrario, this is also
limiting the practical adoption of hexadecimal which requires more complex
syntax even if it should be more compact).

Still today, the decimal system is the most widely used, but may be in
solme future, hexadecimal will become popular and translated in actual
languages to express numbers. Then it will be time to have actual
characters added with distinctive forms for the 6 additional digits,
instead of borrowing Latin letters. This could come first from other
languages than those currently using Latin (I think it may appear first in
China, Japan or Korea, as part of the sinographic system or as extensions
of kanas and hangul, and rapidely adopted in South Asia, and once again
European scripts will be the last to accept the change, just as they were
very late in adopting the concept of zero, negative numbers and fractional
decimals using digits, and separators for grouping/decimals).

Yes, I don't see why there's still no hexadecimal extension digits added,
even if today most hexadecimal numbers are used only in technical
programming languages that are standardized only using basic Latin/ASCII.
The barrier is still the adoption also in humane languages for general use,
as well as various legal restrictions (notably for
pricing/billing/accounting/contracting/taxing). There's is less
restrictions in the old legal/judiciary traditions where other systems were
largely in use (and are still !)


Le mar. 25 sept. 2018 ? 23:55, Steven R. Loomis via CLDR-Users <
cldr-users at unicode.org> a ?crit :

> The numbering system is defined in TR 35 in
> https://unicode.org/reports/tr35/tr35-numbers.html#Numbering_Systems in
> terms of either '*numeric*' (which are decimal systems, just substituting
> different digits for "0123456789", such as ????????????  for the Vai
> language, or else *algorithmic* which are more complex rule based. I
> suppose octal and tonal (hexadecimal?!) could be supported by the
> algorithmic approach.
>
>
>
>
>
> On Tue, Sep 25, 2018 at 2:27 PM Luke Dashjr <luke at dashjr.org> wrote:
>
>> It's been a while since I tried, but I didn't see any possible way to
>> define a
>> locale's number system (eg, octal or tonal instead of decimal).
>>
>> On Saturday 22 September 2018 00:34:27 Steven R. Loomis via CLDR-Users
>> wrote:
>> > Hello, and welcome to the new cldr-users members.
>> >
>> > For discussion:
>> >
>> > At the IUC conference last week, a few of us discussed around lunch some
>> > issues around getting new locales into CLDR, and barriers to entry.
>> >
>> > Barriers:
>> > - we discussed that it could be confusing or difficult to collect all of
>> > the data needed for a minimal locale:
>> > http://cldr.unicode.org/index/cldr-spec/minimaldata - especially
>> > pluralization data
>> > - what about fonts? keyboards?
>> > - what are the best ways to coordinate efforts between the language
>> users
>> > and different technical experts?
>> >
>> > Ideas:
>> > - a web app to take in new locale data?
>> > - a web app to debug/explore plurals?
>> > - allowing some locales to 'get started' without plural rules?
>> >
>> > Links for discussion:
>> > - Elnaz and Steven's prez from (last) Monday: https://goo.gl/sN7biw
>> > - My "full stack" blog post:
>> > https://srl295.github.io/2017/06/06/full-stack-enablement/
>>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180926/f60a010c/attachment-0001.html>

From cldr-users at unicode.org  Tue Sep 25 21:59:40 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Wed, 26 Sep 2018 04:59:40 +0200 (CEST)
Subject: Locale bringup and barriers for entry
Message-ID: <1707078068.37.1537930780251.JavaMail.www@wwinf2209>

What locales are you referring to? If they are new to CLDR, and you experienced difficulties in setting up their numbering system, then there is yet a supplemental barrier.
?
As far as I can see, I only know Sumerian and Babylonian locales using sexagesimal numbering. Octal and hexadecimal/tonal as a locale?s numbering system are discouraged as counterintuitive, as they neither allow people to count on fingers in a straightforward way, nor to efficiently communicate digits using hand gestures. More generally, I don?t believe that it could be useful for a locale to focus on its numbering system in order to get away from widespread usage. Yes we really do need to make changes, but the numbering system does in no way appear to me to seem to be in any way the right end to begin with. Sorry to tell it bluntly, but I?d suggest to focus on getting all existing locales into CLDR, unlike what is suggested in the comments I?d pointed in my previous message, and on fixing existing errors. If any existing living locale does use octal, tonal, sexagesimal, or whatever non-decimal system beside purely notational conventions like Roman, then indeed we need to dig deeper into the matter in order to get them into CLDR.
?
Having said that, as Steven pointed out, there are already some locales using algorithmic numbering, as seen in the data:
?
https://www.unicode.org/repos/cldr/tags/latest/common/bcp47/number.xml
https://www.unicode.org/repos/cldr/tags/latest/common/supplemental/numberingSystems.xml
?
For reference, here is the specification, not very explicit about algorithmic:
http://www.unicode.org/reports/tr35/#Numbering%20System%20Data
?
?
Nevertheless I don?t think that Nystrom was wrong in challenging the elites of his generation, given the current approach proved to be a slope into catastrophe, so that today we need to make changes at 180?, or 8 tims when expressing it in tonal, like those suggested on:
http://sunsite.monsite-orange.fr/page-5b9e092880342.html
?
Regards,
?
Marcel
?
On 26/09/18 00:43 Philippe Verdy via CLDR-Users wrote:
>
octal and hexadecimal (as well as binary) are obviously numeric system using the same digits (or borrowing additional letters or adding other supplemental digits): the algorithm behind is the same as decimal, it's just using a different base (not necessarily wrriten each time but infered from the context), and that algorithm is equally simple, it's basic arithmetic expressed over a cyclic group. That numeric notation is contradicted by the way nbumbers are actually spelled in actual languages, where the base is obviously not just decimal but is using larger bases (most often 1000 in European traditions, but 100 or 10000 in parts of Asia, with various exeptions using remainining traces of base 20). Historically, numbers had mystic or religious traditions, and there remains some old systems using base 12 (including the old English and Celtic traditions).

>
Octal and heaxdecimal are certainly modern inventions for technical reasons (or limitations for and older state-of-the-art technology and costs of implementations when pure binary system was simply unusable for most usages; usage of octal is now deprecated, largely replaced by hexadecimal... except in wellknown programming languages and in old technical documentations for the oldest computing standards that were never really deprecated completely to become really out of use or because of compatibility issues: its support is still mandatory as its also impacts how these programming languages are parsed into unbreakable lexical tokens: it would be unpractical to change this basic tokenisation algorithm on which the rest of the language is built, but a contrario, this is also limiting the practical adoption of hexadecimal which requires more complex syntax even if it should be more compact).

>
Still today, the decimal system is the most widely used, but may be in solme future, hexadecimal will become popular and translated in actual languages to express numbers. Then it will be time to have actual characters added with distinctive forms for the 6 additional digits, instead of borrowing Latin letters. This could come first from other languages than those currently using Latin (I think it may appear first in China, Japan or Korea, as part of the sinographic system or as extensions of kanas and hangul, and rapidely adopted in South Asia, and once again European scripts will be the last to accept the change, just as they were very late in adopting the concept of zero, negative numbers and fractional decimals using digits, and separators for grouping/decimals).

>
Yes, I don't see why there's still no hexadecimal extension digits added, even if today most hexadecimal numbers are used only in technical programming languages that are standardized only using basic Latin/ASCII. The barrier is still the adoption also in humane languages for general use, as well as various legal restrictions (notably for pricing/billing/accounting/contracting/taxing). There's is less restrictions in the old legal/judiciary traditions where other systems were largely in use (and are still !)

>

>


>

Le?mar. 25 sept. 2018 ??23:55, Steven R. Loomis via CLDR-Users  a ?crit?:
>


The numbering system is defined in TR 35 in?https://unicode.org/reports/tr35/tr35-numbers.html#Numbering_Systems in terms of either 'numeric' (which are decimal systems, just substituting different digits for "0123456789", such as ??????????????for the Vai language, or else algorithmic?which are more complex rule based. I suppose octal and tonal (hexadecimal?!) could be supported by the algorithmic approach.

>

>

>


>

>

On Tue, Sep 25, 2018 at 2:27 PM Luke Dashjr  wrote:
>
It's been a while since I tried, but I didn't see any possible way to define a 
> locale's number system (eg, octal or tonal instead of decimal).
> 
> On Saturday 22 September 2018 00:34:27 Steven R. Loomis via CLDR-Users wrote:
> > Hello, and welcome to the new cldr-users members.
> >
> > For discussion:
> >
> > At the IUC conference last week, a few of us discussed around lunch some
> > issues around getting new locales into CLDR, and barriers to entry.
> >
> > Barriers:
> > - we discussed that it could be confusing or difficult to collect all of
> > the data needed for a minimal locale:
> > http://cldr.unicode.org/index/cldr-spec/minimaldata - especially
> > pluralization data
> > - what about fonts? keyboards?
> > - what are the best ways to coordinate efforts between the language users
> > and different technical experts?
> >
> > Ideas:
> > - a web app to take in new locale data?
> > - a web app to debug/explore plurals?
> > - allowing some locales to 'get started' without plural rules?
> >
> > Links for discussion:
> > - Elnaz and Steven's prez from (last) Monday: https://goo.gl/sN7biw
> > - My "full stack" blog post:
> > https://srl295.github.io/2017/06/06/full-stack-enablement/
>


_______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>


_______________________________________________
CLDR-Users mailing list
CLDR-Users at unicode.org
http://unicode.org/mailman/listinfo/cldr-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180926/ffe868f6/attachment.html>

From cldr-users at unicode.org  Wed Sep 26 09:38:01 2018
From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users)
Date: Wed, 26 Sep 2018 16:38:01 +0200
Subject: Locale bringup and barriers for entry
In-Reply-To: <CAGa7JC2FMNeQq3Jr3Wzths3PxbGUMEwHK-m8QN8YdvQkLwTtgg@mail.gmail.com>
References: <474792420.10994.1537818695522.JavaMail.www@wwinf2227>
 <CAJ2xs_F0yH9hdU9yiMb9LOic1jd2qDQF2aAhGvXyZjWUihaF0A@mail.gmail.com>
 <329976032.379.1537854865588.JavaMail.www@wwinf2227>
 <CAGa7JC2voEr0L3nZRLppD03cfGKGrJdHPGej-cwnc_3meFsA8A@mail.gmail.com>
 <312813575.3078.1537867951491.JavaMail.www@wwinf2227>
 <CAGa7JC02W9wV3NJV75fOJ8qAWS=z4kRm6EnXFf-8kvTQmROKkA@mail.gmail.com>
 <CAGa7JC2FMNeQq3Jr3Wzths3PxbGUMEwHK-m8QN8YdvQkLwTtgg@mail.gmail.com>
Message-ID: <CAJ2xs_G48zqT05SLDD=PzGaemCZE-9S3dsDni+UGEgWdRWcqRA@mail.gmail.com>

> Note that the supplemental data is OK for the "cardinal" and "range" type
of categories, but largely failing almost everywhere for the "ordinal" type.

This is due to a misunderstanding of how ordinal works. It is just like
cardinal (plural) in that the translator is responsible for the text, *and*
accounting for gender. The examples given are thus irrelevant.

"Prenez la 1re ? droite"

Would be:

one: "Prenez la {number}re ? droite"
other: "Prenez la {number}e ? droite"

or

one: "Tournez au {number}er feu ? droite"
other: "Tournez au {number}e feu ? droite"

To reiterate, the handling of grammatical inflections other than
plurals/ordinals is outside the current scope of CLDR, but it is false to
say that CLDR "fails" for ordinals.

I would recommend that before you say "CLDR fails at X", you first ask so
that you can verify that your understanding of CLDR is correct.

Mark


On Tue, Sep 25, 2018 at 1:21 PM Philippe Verdy <verdy_p at wanadoo.fr> wrote:

> Note that the supplemental data is OK for the "cardinal" and "range" type
> of categories, but largely failing almost everywhere for the "ordinal" type.
> E.g. in French: "Prenez la 1re ? droite" (this assumes the feminine
> gender, which is ok for "rue"="street", "avenue", or "sortie"="exit", but
> wrong for "feu"="trafic light" or "stop" which are masculine, as in
> "Tournez au 1er feu ? droite", where "1er" and "1re" change depending on
> the gender of the explicit or implicit noun)
>
> Yes ordinals (but also fractions) need derivation by gender (as well as
> grammatical case) including for abbreviated forms (e.g. in French, Italian,
> Spanish, but even in English with inflected leading articles like "a" vs.
> "an", which depends on the numeric value of the ordinal).
>
> And I see little use of these "ordinal" types except in strict isolation
> assuming a nominal use (outside of real sentences where they will be
> inserted) without any relation with the noun (or nominal group) to which
> they refer (note: this noun or nominal group may be outside the curent
> isolated "paragraph", such as a column heading, or other info such as
> resulting ranks in sportive competition for women, vs. the same table for
> men.
>
> Basically this means that CLDR just provides baic data that still needs to
> be tuned and localized again for specific applications, even if this tuning
> is generic. What CLDR can do however is to monitor if there are stable
> applications desiring to interchange their localized data containign gender
> or case differences: if their localisation data is large enough to cover
> enough locales for a significant part of the world and theyr want to
> interoperate, they will create a defacto standard that can be integrated
> (after being proposed to CLDR with enough examplar data and open licencing).
>
> Such applications already exist (notably across wikis, ven if this still
> requires much work to have them cooperate together to stabilize some issues
> and agree to some common formats, and efficicently track the translations
> problems remaining and how to manage the remaining incoherences, as well as
> accepting some deviations for specific uses in more specific pages they
> don't want to break).
>
>
>
> Le mar. 25 sept. 2018 ? 13:02, Philippe Verdy <verdy_p at wanadoo.fr> a
> ?crit :
>
>>
>>
>> Le mar. 25 sept. 2018 ? 11:32, Marcel Schneider <charupdate at orange.fr> a
>> ?crit :
>>
>>> On 25/09/18 10:00 Philippe Verdy wrote:
>>> > Plural rules are documented. These are defined as minimal data needed
>>> to start any new locale.
>>>
>>> That seems to be one of those barriers that Steven is now questioning,
>>> or even the main barrier for entry.
>>> For me that would remain a barrier as long as I cannot get clear insight
>>> nor see straightforward structures to fill in.
>>>
>>> See the documentation:
>> http://cldr.unicode.org/index/cldr-spec/plural-rules
>>
>> And the supplemental data which gives a list per locale:
>>
>> http://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180926/443b6931/attachment.html>

From cldr-users at unicode.org  Wed Sep 26 09:43:22 2018
From: cldr-users at unicode.org (=?UTF-8?B?TWFyayBEYXZpcyDimJXvuI8=?= via CLDR-Users)
Date: Wed, 26 Sep 2018 16:43:22 +0200
Subject: Locale bringup and barriers for entry
In-Reply-To: <1707078068.37.1537930780251.JavaMail.www@wwinf2209>
References: <1707078068.37.1537930780251.JavaMail.www@wwinf2209>
Message-ID: <CAJ2xs_HRndM_BvurwsgoR82rWGVBuyH+fwusuYXGMbNNvHwXuw@mail.gmail.com>

CLDR does not currently handle octal or hexadecimal formats because those
are not in customary use by normal users. They are clearly used by
programmers, but that is specialized usage that doesn't require special
formatting across human languages.

I suggest that people focus on practical issues connected with CLDR and not
ramble on about issues that are not particular important to CLDR users.

Mark


On Wed, Sep 26, 2018 at 5:00 AM Marcel Schneider via CLDR-Users <
cldr-users at unicode.org> wrote:

> What locales are you referring to? If they are new to CLDR, and you
> experienced difficulties in setting up their numbering system, then there
> is yet a supplemental barrier.
>
>
>
> As far as I can see, I only know Sumerian and Babylonian locales using
> sexagesimal numbering. Octal and hexadecimal/tonal as a locale?s numbering
> system are discouraged as counterintuitive, as they neither allow people to
> count on fingers in a straightforward way, nor to efficiently communicate
> digits using hand gestures. More generally, I don?t believe that it could
> be useful for a locale to focus on its numbering system in order to get
> away from widespread usage. Yes we really do need to make changes, but the
> numbering system does in no way appear to me to seem to be in any way the
> right end to begin with. Sorry to tell it bluntly, but I?d suggest to focus
> on getting all existing locales into CLDR, unlike what is suggested in the
> comments I?d pointed in my previous message, and on fixing existing errors.
> If any existing living locale does use octal, tonal, sexagesimal, or
> whatever non-decimal system beside purely notational conventions like
> Roman, then indeed we need to dig deeper into the matter in order to get
> them into CLDR.
>
>
>
> Having said that, as Steven pointed out, there are already some locales
> using algorithmic numbering, as seen in the data:
>
>
>
> https://www.unicode.org/repos/cldr/tags/latest/common/bcp47/number.xml
>
>
> https://www.unicode.org/repos/cldr/tags/latest/common/supplemental/numberingSystems.xml
>
>
>
> For reference, here is the specification, not very explicit about
> algorithmic:
>
> http://www.unicode.org/reports/tr35/#Numbering%20System%20Data
>
>
>
>
>
> Nevertheless I don?t think that Nystrom was wrong in challenging the
> elites of his generation, given the current approach proved to be a slope
> into catastrophe, so that today we need to make changes at 180?, or 8 tims
> when expressing it in tonal, like those suggested on:
>
> http://sunsite.monsite-orange.fr/page-5b9e092880342.html
>
>
>
> Regards,
>
>
>
> Marcel
>
>
>
> On 26/09/18 00:43 Philippe Verdy via CLDR-Users wrote:
>
> >
> octal and hexadecimal (as well as binary) are obviously numeric system
> using the same digits (or borrowing additional letters or adding other
> supplemental digits): the algorithm behind is the same as decimal, it's
> just using a different base (not necessarily wrriten each time but infered
> from the context), and that algorithm is equally simple, it's basic
> arithmetic expressed over a cyclic group. That numeric notation is
> contradicted by the way nbumbers are actually spelled in actual languages,
> where the base is obviously not just decimal but is using larger bases
> (most often 1000 in European traditions, but 100 or 10000 in parts of Asia,
> with various exeptions using remainining traces of base 20). Historically,
> numbers had mystic or religious traditions, and there remains some old
> systems using base 12 (including the old English and Celtic traditions).
>
> >
> Octal and heaxdecimal are certainly modern inventions for technical
> reasons (or limitations for and older state-of-the-art technology and costs
> of implementations when pure binary system was simply unusable for most
> usages; usage of octal is now deprecated, largely replaced by
> hexadecimal... except in wellknown programming languages and in old
> technical documentations for the oldest computing standards that were never
> really deprecated completely to become really out of use or because of
> compatibility issues: its support is still mandatory as its also impacts
> how these programming languages are parsed into unbreakable lexical tokens:
> it would be unpractical to change this basic tokenisation algorithm on
> which the rest of the language is built, but a contrario, this is also
> limiting the practical adoption of hexadecimal which requires more complex
> syntax even if it should be more compact).
>
> >
> Still today, the decimal system is the most widely used, but may be in
> solme future, hexadecimal will become popular and translated in actual
> languages to express numbers. Then it will be time to have actual
> characters added with distinctive forms for the 6 additional digits,
> instead of borrowing Latin letters. This could come first from other
> languages than those currently using Latin (I think it may appear first in
> China, Japan or Korea, as part of the sinographic system or as extensions
> of kanas and hangul, and rapidely adopted in South Asia, and once again
> European scripts will be the last to accept the change, just as they were
> very late in adopting the concept of zero, negative numbers and fractional
> decimals using digits, and separators for grouping/decimals).
>
> >
> Yes, I don't see why there's still no hexadecimal extension digits added,
> even if today most hexadecimal numbers are used only in technical
> programming languages that are standardized only using basic Latin/ASCII.
> The barrier is still the adoption also in humane languages for general use,
> as well as various legal restrictions (notably for
> pricing/billing/accounting/contracting/taxing). There's is less
> restrictions in the old legal/judiciary traditions where other systems were
> largely in use (and are still !)
>
> >
>
> >
>
> >
> Le mar. 25 sept. 2018 ? 23:55, Steven R. Loomis via CLDR-Users <
> cldr-users at unicode.org> a ?crit :
> >
>
>> The numbering system is defined in TR 35 in
>> https://unicode.org/reports/tr35/tr35-numbers.html#Numbering_Systems in
>> terms of either '*numeric*' (which are decimal systems, just
>> substituting different digits for "0123456789", such as ????????????  for
>> the Vai language, or else *algorithmic* which are more complex rule
>> based. I suppose octal and tonal (hexadecimal?!) could be supported by the
>> algorithmic approach.
>>
>> >
>>
>> >
>>
>> >
>>
>> >
>>
>> >
>> On Tue, Sep 25, 2018 at 2:27 PM Luke Dashjr <luke at dashjr.org> wrote:
>> >
>>
>>> It's been a while since I tried, but I didn't see any possible way to
>>> define a
>>> > locale's number system (eg, octal or tonal instead of decimal).
>>> >
>>> > On Saturday 22 September 2018 00:34:27 Steven R. Loomis via CLDR-Users
>>> wrote:
>>> > > Hello, and welcome to the new cldr-users members.
>>> > >
>>> > > For discussion:
>>> > >
>>> > > At the IUC conference last week, a few of us discussed around lunch
>>> some
>>> > > issues around getting new locales into CLDR, and barriers to entry.
>>> > >
>>> > > Barriers:
>>> > > - we discussed that it could be confusing or difficult to collect
>>> all of
>>> > > the data needed for a minimal locale:
>>> > > http://cldr.unicode.org/index/cldr-spec/minimaldata - especially
>>> > > pluralization data
>>> > > - what about fonts? keyboards?
>>> > > - what are the best ways to coordinate efforts between the language
>>> users
>>> > > and different technical experts?
>>> > >
>>> > > Ideas:
>>> > > - a web app to take in new locale data?
>>> > > - a web app to debug/explore plurals?
>>> > > - allowing some locales to 'get started' without plural rules?
>>> > >
>>> > > Links for discussion:
>>> > > - Elnaz and Steven's prez from (last) Monday: https://goo.gl/sN7biw
>>> > > - My "full stack" blog post:
>>> > > https://srl295.github.io/2017/06/06/full-stack-enablement/
>>> >
>>
>> _______________________________________________
>> > CLDR-Users mailing list
>> > CLDR-Users at unicode.org
>> > http://unicode.org/mailman/listinfo/cldr-users
>> >
>
>
>
>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180926/766465b5/attachment-0001.html>

From cldr-users at unicode.org  Thu Sep 27 00:32:56 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Thu, 27 Sep 2018 07:32:56 +0200 (CEST)
Subject: Locale bringup and barriers for entry
Message-ID: <1917168749.272.1538026376218.JavaMail.www@wwinf2209>

On 26/09/18 16:45 Mark Davis ?? via CLDR-Users wrote:
>
> CLDR does not currently handle octal or hexadecimal formats because those are not in customary use by normal users.
> They are clearly used by programmers, but that is specialized usage that doesn't require special formatting across human languages.
>
> I suggest that people focus on practical issues connected with CLDR and not ramble on about issues that are not particular important to CLDR users.

That is my opinion too, that this thread shouldn?t be abused to discuss issues irrelevant to CLDR.
But after having sent many replies after thread launch, all of which intended to help newcomers get started with CLDR, 
I thought it unfair on my part not to respond to Luke, nor were I going to behave as if I was scared into silence by the new turn of the discussion.

The underlying message was: If people want to be disruptive, here?s what I?d suggest to focus on first.
But that was not all. I also stated:

>> [?] I?d suggest to focus on getting all existing locales into CLDR,
> > unlike what is suggested in the comments I?d pointed in my previous message,
> > and on fixing existing errors.

Sorry for getting off-topic beside that.

Regards,

Marcel


From cldr-users at unicode.org  Thu Sep 27 01:00:16 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Thu, 27 Sep 2018 08:00:16 +0200 (CEST)
Subject: Locale bringup and barriers for entry
In-Reply-To: <CAJ2xs_G48zqT05SLDD=PzGaemCZE-9S3dsDni+UGEgWdRWcqRA@mail.gmail.com>
References: <474792420.10994.1537818695522.JavaMail.www@wwinf2227>
 <CAJ2xs_F0yH9hdU9yiMb9LOic1jd2qDQF2aAhGvXyZjWUihaF0A@mail.gmail.com>
 <329976032.379.1537854865588.JavaMail.www@wwinf2227>
 <CAGa7JC2voEr0L3nZRLppD03cfGKGrJdHPGej-cwnc_3meFsA8A@mail.gmail.com>
 <312813575.3078.1537867951491.JavaMail.www@wwinf2227>
 <CAGa7JC02W9wV3NJV75fOJ8qAWS=z4kRm6EnXFf-8kvTQmROKkA@mail.gmail.com>
 <CAGa7JC2FMNeQq3Jr3Wzths3PxbGUMEwHK-m8QN8YdvQkLwTtgg@mail.gmail.com>
 <CAJ2xs_G48zqT05SLDD=PzGaemCZE-9S3dsDni+UGEgWdRWcqRA@mail.gmail.com>
Message-ID: <186356847.391.1538028016410.JavaMail.www@wwinf2209>

On 26/09/18 16:38 Mark Davis ?? wrote:

[quote]

> This is due to a misunderstanding of how ordinal works. It is just like cardinal (plural)
> in that the translator is responsible for the text, and accounting for gender.
> The examples given are thus irrelevant.?

[examples]

> To reiterate, the handling of grammatical inflections other than plurals/ordinals
> is outside the current scope of CLDR, [?]

Thank you for this clarification.

I?ll take away that CLDR gives hints about which numbers require special handling when 
being part of messages, but not about how messages are to be inflected depending on 
the current value of the number placeholder.

Is the label ?Minimal Pairs? misleading?

Eg Dutch has ordinals one-fits-all, only "other", and a single minimal pair: ?Neem de 15e afslag rechts.?

Beside, I wonder whether the -e should be superscript: 'Neem de 15? afslag rechts.'


Regards,

Marcel


From cldr-users at unicode.org  Thu Sep 27 05:17:00 2018
From: cldr-users at unicode.org (Marcel Schneider via CLDR-Users)
Date: Thu, 27 Sep 2018 12:17:00 +0200 (CEST)
Subject: Locale bringup and barriers for entry
In-Reply-To: <186356847.391.1538028016410.JavaMail.www@wwinf2209>
References: <474792420.10994.1537818695522.JavaMail.www@wwinf2227>
 <CAJ2xs_F0yH9hdU9yiMb9LOic1jd2qDQF2aAhGvXyZjWUihaF0A@mail.gmail.com>
 <329976032.379.1537854865588.JavaMail.www@wwinf2227>
 <CAGa7JC2voEr0L3nZRLppD03cfGKGrJdHPGej-cwnc_3meFsA8A@mail.gmail.com>
 <312813575.3078.1537867951491.JavaMail.www@wwinf2227>
 <CAGa7JC02W9wV3NJV75fOJ8qAWS=z4kRm6EnXFf-8kvTQmROKkA@mail.gmail.com>
 <CAGa7JC2FMNeQq3Jr3Wzths3PxbGUMEwHK-m8QN8YdvQkLwTtgg@mail.gmail.com>
 <CAJ2xs_G48zqT05SLDD=PzGaemCZE-9S3dsDni+UGEgWdRWcqRA@mail.gmail.com>
 <186356847.391.1538028016410.JavaMail.www@wwinf2209>
Message-ID: <827563207.3124.1538043420529.JavaMail.www@wwinf2209>

> Is the label ?Minimal Pairs? misleading?

I?m now seeming able to answer my question:
IMO the misconception about what CLDR is supposed to do for ordinals is fueled by the way the data 
is represented in the charts and in the LDML sources. While English has a comprehensive list of all
existing ordinal inflections, French does not, and that seems to be what may make people believe that 
some data is missing, and that ?the supplemental data is [?] failing.?

Mark Davis wrote:

> the translator is responsible for the text, and accounting for gender. The examples given are thus irrelevant. 

So the header should not be ?Minimal Pairs? but just ?Examples? again.
As of the provided text, it could be stripped off, and abstract rules be put in its place.
That could be even more useful, as demonstrated by the category "special2" in the French example below:

Eg for French:
<ordinal category="default">Ordinal abbreviation is built by appending default ordinal indicator to the digit.</ordinal>
<ordinal category="special1">Ordinal 1 has peculiar inflection.</ordinal>
<ordinal category="special2">Ordinal 2 has peculiar inflection when designating rank.</ordinal>

For Italian:
<ordinal category="default">Ordinal abbreviation is built by appending default ordinal indicator to the digit.</ordinal>
<ordinal category="special1">Vowel of article may be elided if number long form starts with a vowel, even if number is short form.</ordinal>

For English:
<ordinal category="default">Ordinal abbreviation is built by appending default ordinal indicator to the digit.</ordinal>
<ordinal category="special1">Ordinal 1 has peculiar inflection.</ordinal>
<ordinal category="special2">Ordinal 2 has peculiar inflection.</ordinal>
<ordinal category="special3">Ordinal 3 has peculiar inflection.</ordinal>


That?s at least what the statements made so far appear to boil down to.

But given some of these rules may be lengthy (eg for category "special1" in the Italian example), 
CLDR may be better off by providing sample text. That?s tricky however, as parsing sample text 
while being aware of what it is to mean, and what it is not, may be non-obvious.

That brings back to what I tried to suggest when arguing in some way that
a system of rules is more straigtforward than a collection of samples, 
especially when provided not for teaching humans, but for informing processes.

But given what I?m suggesting to do is to reengineer that part of CLDR, I?ve 
little hope that anything will be changed. 

There?s even no need for change if really CLDR users are happy with the actual state of the art.


Regards,

Marcel