Unicode 11 Georgian uppercase vs. fonts
Peter Constable via Unicode
unicode at unicode.org
Fri Jul 27 20:45:53 CDT 2018
Just an observation on these issues: When the Mtavruli proposal was first presented to UTC, several UTC members voiced strong reservation because of the kind of issues mentioned for case mapping, and in particular on database indexing and querying. Several months later, various UTC members participated in a teleconference with representation from Georgian institutions, including IT people from Bank of Georgia and TBC Bank. During that meeting, the representatives of the Georgian enterprises (i) demonstrated an understanding of those issues and the implications, (ii) gave an indication of support from those enterprises and a commitment to update their applications as may be required, and (iii) gave indication of intent to develop a plan of action for preparing their institutions for this change as well as communicating that within Georgian industry and society. It was only after that did UTC feel it was viable to proceed with encoding Mtavruli characters.
From: Unicode <unicode-bounces at unicode.org> On Behalf Of Asmus Freytag via Unicode
Sent: Friday, July 27, 2018 7:01 AM
To: unicode at unicode.org
Subject: Re: Unicode 11 Georgian uppercase vs. fonts
On 7/27/2018 3:42 AM, Michael Everson via Unicode wrote:
Yes and it explains clearly that “effectively caseless Georgian” is incorrect. Georgian has case. Georgian uses case differently from other scripts. This is an orthographic distinction, not a structural one. In fact as it is also stated in the proposal, there are 19th-century texts which do titlecase. It’s just that that orthography is no longer in use and that behaviour no longer desirable.
"Georgian uses case differently from other scripts"
That's one of the key issues here for developers (and users) of libraries. Because it means that any implicit assumptions about the applicability of a certain case-transform is now broken.
This goes beyond whether fonts are actually installed now or at the end of some transition period, or ever: if functions like ToUpper, which used to have no effect on Georgian before, suddenly do - in ways that the users of the script do not expect, then your application is broken, from one day to the next.
The current situation prior to the change is perhaps best characterized by saying that there was support for some locale differences in the way certain characters were mapped, but not in whether or not to do a given mapping at all.
If, as has been suggested, the use of case in Georgian is more similar to that of smallcaps in other scripts, then, instead of ToUpper doing a case transformation for Georgian, what would be need is something like a "ToSmallCaps" function (better name here, because the Georgian letters aren't actually "small caps").
That way, the existing "ToUpper" could retain its implicit semantic of "uppercase transformation in those scripts where such transformations are used in a common way".
This would solve 1/2 of the problem, which is to prevent uppercasing where users of Georgian do not expect it. However, it does not work in plain text for the other scripts, because there, small caps are not encoded, so there's no plain-text solution.
To get back to Markus' original question on how to handle this for ICU: it seems more and more that Georgian should be exempted from standard library functions and that a new function needs to be added that just transforms Georgian and leaves all other scripts alone (or one that takes a language/local parameter).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode