CLDR survey / Polish keyboard (was: Re: CLDR)

Wed Sep 5 07:56:05 CDT 2018

The email isn't on a single topic, so I just skimmed. Some quick remarks:

> I hope not all CLDR data are driven by Wikidata...
The Wikidata names are only used for subdivisions, and then only for ones
that are "new" (where there were no preexisting names). The names are
currently not visible via the Survey tool, and thus need modification via
tickets. The reason not to show them in the ST is that it would load the
tool down further and burden the vetters (tripling the number of fields).

> using abbreviation like "plpm"

That isn't an abbreviation, it is a code for a subdivision. Corresponds to
the ISO 3166-2 code PL-PM

> they are so greedy

Ad hominem or (ad societatem) remarks are rarely productive, and rarely an
accurate reflection of reality; one reason I seldom look at
unicode at unicode.org.

Mark

On Wed, Sep 5, 2018 at 4:03 AM Marcel Schneider via CLDR-Users <
cldr-users at unicode.org> wrote:

> I’m taking this from Unicode Public mailing list, as the topics belong
> here.
> Though I already responded off-list and would prefer stepping out, I’m
> afraid
> that at least the CLDR part could be really useful in fighting certain
> baseline
> problems I encountered while being given the opportunity to participate in
> surveying fr-FR locale data for the on-coming v34. Hence I feel committed
> to respond “on the record” and reopen the door for eventual follow-up, if
> ever
> I could have seemed to close it.
>
> Indeed while there were many errors and flaws in the data, most covetters
> ended up lacking time to completely review all the items, despite doing a
> really great job while devoting many hours to these tasks. After not
> trying
> to dig deeper so I would have learned what are the issues beneath, I now
> simply speculated on my own about what might have triggered the problems
> in reviewing data and ensuring quality.
>
> The goal is to make CLDR data more reliable, and to suggest what vendors
> might wish to do for that purpose.
>
> At top of the below I’ve cut off a snippet unrelated to these topics, and
> further,
> a snippet for privacy. The slightly blunter off-list wording by contrast
> has not
> been redacted.
> I’ll advise Unicode Public that this thread is moved here.
>
> On 04/09/18 20:10 I wrote:
> To: "Janusz S. Bień" , "James Kass"
> Cc: "Philippe Verdy"
> Subject: [OFF LIST] Re: CLDR
> >
> > On 04/09/18 11:11 James Kass via Unicode wrote:
> > > (This is the response from Janusz S. Bień which was sent to the public
> list.)
> >
> > Thank you James for forwarding. I’m responding off-list as I’m afraid
> that our discussions
> > might not be welcome on the List. […]
> [Deleted for being off-topic.]
> > >
> > > On Mon, Sep 03 2018 at 1:03 -0800, James Kass wrote:
> > >
> > > > Janusz S. Bień wrote,
> > > >
> > […]
> > > Thanks! Most data about Poland at
> > >
> > > https://www.wikidata.org/wiki/Q36
> > >
> > > seem to make sense, but I don't think anybody is using abbreviation
> like
> > > "plpm" (for Pomorze/Pomerania).
> >
> > We can see that part of those codes, for whatever items (regions,
> languages, scripts)
> > are counter-intuitive, and I don’t know neither who is using them in
> running text.
> >
> > >
> > […]
> > > I hope not all CLDR data are driven by Wikidata...
> >
> > I was surprised to learn that even more data is imported without review,
> but Wikidata is
> > clearly a more reliable source than ISO 639, that is used without
> assessing its accuracy.
> >
> > >
> > > On Mon, Sep 03 2018 at 12:28 +0200, Marcel Schneider wrote:
> > […]
> > > > Then I’m sorry to be off-topic.
> > >
> > > Let's say off the original topic. My primary concern is to preserve
> > > somehow such comments as e.g. the one on the bottom of page 14 of
> > >
> > > https://folk.uib.no/hnooh/mufi/specs/MUFI-CodeChart-4-0.pdf
> >
> > Normally this Medieval Latin semicolon abbreviation should be encoded in
> Unicode, which
> > contains already many duplicates of punctuation marks, and we know that
> a punctuation can
> > *never* represent a letter without running into issues.
> >
> > >
> > […]
> > > > I’m volunteering to personally welcome you to contribute to CLDR.
> > >
> > > Thanks. The interesting question is who is/was already contributing
> from
> > > Poland or about Polish language. I vaguely remember a post with this
> > > information, but at that time I was not interested enough to take a
> > > note.
> >
> > I must confess that I wasn’t interested neither, or better, I wasn’t
> aware that I’m to contribute,
> > and perhaps was unable to do so. Normally the vendors, especially Apple
> and Google, should
> > be well-funded enough to be able to appoint as many specialists as
> needed. But it turns out
> > that when paying contractors, they are so greedy that the linguists are
> granted insufficient
> > worktime, eg a certain number of hours, without their managers assessing
> beforehand what
> > is the status of the data and how much work is needed to fix it, and
> consequently not able
> > to renegotiate the service provider contract. That operating mode is
> completely unresponsive
> > on part of Apple and Google, and Microsoft alike (although they have
> less money to devote).
> >
> > >
> > […]
> > > > Polish has
> > > > consistently with By-Type, these quotation marks:
> > > > ' " ” „ « »
> > > > Hence the set is incomplete.
> > >
> > > You are right, thanks. But was is the practical importance of it?
> >
> > The importance of CLDR data being accurate is that having them otherwise
> would
> > reflect badly on the image of a country as being unable or careless.
> >
> > On a general level, another impact of having accurate locale data in
> CLDR is that
> > the repository gets a better reputation. As long as the data is
> unreliable, nobody
> > might actually use it.
> >
> > Yet another implication of the presence of a character in CLDR is being
> a good
> > argument for having it on the keyboard layout. Eg the Breton letter
> apostrophe is
> > not yet on the Breton keyboard layout, despite the issue having been
> discussed
> > on bug tracking / feature request level for XKB. So I informed […]
> [Deleted for privacy.]
> >
> > > I noticed that sometimes in Emacs 'forward-word" behaves strangely on a
> > > text with unusual characters, but had no motivation to investigate how
> > > this is related to the current locale.
> >
> > I’m sorry to be unable to check this, as I’m not yet using Emacs, nor
> Vim.
> >
> > […]
> > >
> > > The standard keyboard has a limiting number of keys, so you have to
> make
> > > compromises. It is generally accepted that Polish keyboard layouts
> > > (there are primarily two of them) does not contain apostrophe or single
> > > quotations marks. There is a proposal by Marcin Woliński
> > >
> > > http://marcinwolinski.pl/keyboard/
> > >
> > > which is available in most Linux distributions but it does not seem
> > > popular.
> >
> > It has even been ported to Windows. But I cannot find it on Ubuntu 16.04.
> > It has various drawbacks, the worst of which is that the most common
> > angle quotation marks «» are on Shift+AltGr level, while the single ones
> ‹›
> > are on AltGr, and likewise for the curly quotes, of which Polish
> currently
> > uses the double ones, whereas the single ones appear to be used only
> > for nested quotations. This swapping frequent punctuation and rare
> punctuation
> > has been done only for consistency with the ASCII apostrophe being in
> the
> > Base shift state, and the ASCII double quote in the Shift shift state as
> on
> > US-QWERTY.
> >
> > That’s how mnemonics and a certain idea of logic are destroying
> usability.
> >
> > Thanks for the link anyway.
> >
> > Best regards,
> >
> > Marcel
>
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20180905/f568b18e/attachment.html>