CLDR survey / Polish keyboard (was: Re: CLDR)

Tue Sep 4 21:02:56 CDT 2018

I’m taking this from Unicode Public mailing list, as the topics belong here.
Though I already responded off-list and would prefer stepping out, I’m afraid 
that at least the CLDR part could be really useful in fighting certain baseline 
problems I encountered while being given the opportunity to participate in 
surveying fr-FR locale data for the on-coming v34. Hence I feel committed 
to respond “on the record” and reopen the door for eventual follow-up, if ever
I could have seemed to close it.

Indeed while there were many errors and flaws in the data, most covetters 
ended up lacking time to completely review all the items, despite doing a 
really great job while devoting many hours to these tasks. After not trying 
to dig deeper so I would have learned what are the issues beneath, I now 
simply speculated on my own about what might have triggered the problems
in reviewing data and ensuring quality.

The goal is to make CLDR data more reliable, and to suggest what vendors 
might wish to do for that purpose.

At top of the below I’ve cut off a snippet unrelated to these topics, and further, 
a snippet for privacy. The slightly blunter off-list wording by contrast has not 
been redacted.
I’ll advise Unicode Public that this thread is moved here.

On 04/09/18 20:10 I wrote:
To: "Janusz S. Bień" , "James Kass" 
Cc: "Philippe Verdy" 
Subject: [OFF LIST] Re: CLDR
> 
> On 04/09/18 11:11 James Kass via Unicode wrote:
> > (This is the response from Janusz S. Bień which was sent to the public list.)
> 
> Thank you James for forwarding. I’m responding off-list as I’m afraid that our discussions 
> might not be welcome on the List. […]
[Deleted for being off-topic.]
> > 
> > On Mon, Sep 03 2018 at 1:03 -0800, James Kass wrote:
> > 
> > > Janusz S. Bień wrote,
> > >
> […]
> > Thanks! Most data about Poland at
> > 
> > https://www.wikidata.org/wiki/Q36
> > 
> > seem to make sense, but I don't think anybody is using abbreviation like
> > "plpm" (for Pomorze/Pomerania).
> 
> We can see that part of those codes, for whatever items (regions, languages, scripts)
> are counter-intuitive, and I don’t know neither who is using them in running text.
> 
> > 
> […]
> > I hope not all CLDR data are driven by Wikidata...
> 
> I was surprised to learn that even more data is imported without review, but Wikidata is
> clearly a more reliable source than ISO 639, that is used without assessing its accuracy.
> 
> > 
> > On Mon, Sep 03 2018 at 12:28 +0200, Marcel Schneider wrote:
> […]
> > > Then I’m sorry to be off-topic.
> > 
> > Let's say off the original topic. My primary concern is to preserve
> > somehow such comments as e.g. the one on the bottom of page 14 of
> > 
> > https://folk.uib.no/hnooh/mufi/specs/MUFI-CodeChart-4-0.pdf
> 
> Normally this Medieval Latin semicolon abbreviation should be encoded in Unicode, which 
> contains already many duplicates of punctuation marks, and we know that a punctuation can 
> *never* represent a letter without running into issues.
> 
> > 
> […]
> > > I’m volunteering to personally welcome you to contribute to CLDR.
> > 
> > Thanks. The interesting question is who is/was already contributing from
> > Poland or about Polish language. I vaguely remember a post with this
> > information, but at that time I was not interested enough to take a
> > note.
> 
> I must confess that I wasn’t interested neither, or better, I wasn’t aware that I’m to contribute,
> and perhaps was unable to do so. Normally the vendors, especially Apple and Google, should 
> be well-funded enough to be able to appoint as many specialists as needed. But it turns out 
> that when paying contractors, they are so greedy that the linguists are granted insufficient 
> worktime, eg a certain number of hours, without their managers assessing beforehand what 
> is the status of the data and how much work is needed to fix it, and consequently not able 
> to renegotiate the service provider contract. That operating mode is completely unresponsive 
> on part of Apple and Google, and Microsoft alike (although they have less money to devote).
> 
> > 
> […]
> > > Polish has
> > > consistently with By-Type, these quotation marks:
> > > ' " ” „ « »
> > > Hence the set is incomplete.
> > 
> > You are right, thanks. But was is the practical importance of it?
> 
> The importance of CLDR data being accurate is that having them otherwise would 
> reflect badly on the image of a country as being unable or careless.
> 
> On a general level, another impact of having accurate locale data in CLDR is that 
> the repository gets a better reputation. As long as the data is unreliable, nobody 
> might actually use it.
> 
> Yet another implication of the presence of a character in CLDR is being a good 
> argument for having it on the keyboard layout. Eg the Breton letter apostrophe is
> not yet on the Breton keyboard layout, despite the issue having been discussed 
> on bug tracking / feature request level for XKB. So I informed […]
[Deleted for privacy.]
> 
> > I noticed that sometimes in Emacs 'forward-word" behaves strangely on a
> > text with unusual characters, but had no motivation to investigate how
> > this is related to the current locale.
> 
> I’m sorry to be unable to check this, as I’m not yet using Emacs, nor Vim.
> 
> […]
> > 
> > The standard keyboard has a limiting number of keys, so you have to make
> > compromises. It is generally accepted that Polish keyboard layouts
> > (there are primarily two of them) does not contain apostrophe or single
> > quotations marks. There is a proposal by Marcin Woliński
> > 
> > http://marcinwolinski.pl/keyboard/
> > 
> > which is available in most Linux distributions but it does not seem
> > popular.
> 
> It has even been ported to Windows. But I cannot find it on Ubuntu 16.04.
> It has various drawbacks, the worst of which is that the most common 
> angle quotation marks «» are on Shift+AltGr level, while the single ones ‹› 
> are on AltGr, and likewise for the curly quotes, of which Polish currently 
> uses the double ones, whereas the single ones appear to be used only 
> for nested quotations. This swapping frequent punctuation and rare punctuation
> has been done only for consistency with the ASCII apostrophe being in the 
> Base shift state, and the ASCII double quote in the Shift shift state as on 
> US-QWERTY. 
> 
> That’s how mnemonics and a certain idea of logic are destroying usability.
> 
> Thanks for the link anyway.
> 
> Best regards,
> 
> Marcel