CLDR

Marcel Schneider via Unicode unicode at unicode.org
Mon Sep 3 05:28:18 CDT 2018


On 03/09/18 09:53 Janusz S. Bień via Unicode wrote:
> 
> On Fri, Aug 31 2018 at 10:27 +0200, Manuel Strehl via Unicode wrote:
> > The XML files in these folders:
> >
> > https://unicode.org/repos/cldr/tags/latest/common/
> 
> Thanks for the link.
> 
> In the meantime I rediscovered Locale Explorer
> 
> http://demo.icu-project.org/icu-bin/locexp
> 
> which I used some time ago.

Nice. Actually based on CLDR v31.0.1.

> 
> On Fri, Aug 31 2018 at 12:17 +0200, Marcel Schneider via Unicode wrote:
> > On 31/08/18 07:27 Janusz S. Bień via Unicode wrote:
> > […]
> >> > Given NamesList.txt / Code Charts comments are kept minimal by design, 
> >> > one couldn’t simply pop them into XML or whatever, as the result would be 
> >> > disappointing and call for completion in the aftermath. Yet another task 
> >> > competing with CLDR survey.
> >> 
> >> Please elaborate. It's not clear for me what do you mean.
> >
> > These comments are designed for the Code Charts and as such must not be
> > disproportionate in exhaustivity. Eg we have lists of related languages ending 
> > in an ellipsis.
> 
> Looks like we have different comments in mind.

Then I’m sorry to be off-topic.

[…]
> >> > and we really 
> >> > need to go through the data and correct the many many errors, please.
> 
> But who is the right person or institution to do it?

Software vendors are committed to care for the data, and may delegate survey 
to service providers specialized in localization. Then I think that public language 
offices should be among the reviewers. Beyond, and especially by lack of the
latter, anybody is welcome to contribute as a guest. (Guest votes are 1 and don’t
add one to another.) That is consistent with the fact that Unicode relies on 
volunteers, too.

I’m volunteering to personally welcome you to contribute to CLDR.

[…]
> > Further you will see that while Polish is using apostrophe
> > https://slowodnia.tumblr.com/post/136492530255/the-use-of-apostrophe-in-polish
> > CLDR does not have the correct apostrophe for Polish, as opposed eg to French.
> 
> I understand that by "the correct apostrophe" you mean U+2019 RIGHT
> SINGLE QUOTATION MARK.

Yes.

> 
> > You may wish to note that from now on, both U+0027 APOSTROPHE and 
> > U+0022 QUOTATION MARK are ruled out in almost all locales, given the 
> > preferred characters in publishing are U+2019 and, for Polish, the U+201E and 
> > U+201D that are already found in CLDR pl.
> 
> The situation seems more complicated because the chart
> 
> https://www.unicode.org/cldr/charts/34/by_type/core_data.alphabetic_information.punctuation.html
> 
> contains different list of punctuation characters than
> 
> https://www.unicode.org/cldr/charts/34/summary/pl.html.
> 
> I guess the latter is the primary one, and it contains U+2019 RIGHT
> SINGLE QUOTATION MARK (and U+0x2018 LEFT SINGLE QUOTATION MARK, too).

It’s a bit confusing because there is a column for English and a column for Polish.
The characters you retrieved are actually in the English column, while Polish has 
consistently with By-Type, these quotation marks:
' " ” „ « » 
Hence the set is incomplete.

> 
> >
> > Note however that according to the information provided by English Wikipedia:
> > https://en.wikipedia.org/wiki/Quotation_mark#Polish
> > Polish also uses single quotes, that by contrast are still missing in CLDR.
> 
> You are right, but who cares? Looks like this has no practical
> importance. Nobody complains about the wrong use of quotation marks in
> Polish by Word or OpenOffice, so looks like the software doesn't use
> this information. So this is rather a matter of aesthetics...

I’ve come to the position that to let a word processor “use” quotation marks
is to miss the point. Quotation marks are definitely used by the user typing
in his or her text, and are expected to be on the keyboard layout he or she
is using. So-called smart quotes guessed algorithmically from ASCII simple 
and double quote are but a hazardous workaround when not installing the 
appropriate keyboard layout. At least that is my position :)

Best regards,

Marcel



More information about the Unicode mailing list