Unicode 11 Georgian uppercase vs. fonts

Fri Jul 20 02:21:33 CDT 2018

IMO, the correct answer is 2, except that “all common fonts” is more sweeping that necessary: it’s sufficient to have fonts used for fallback in platforms and browsers, and the related fallback logic, to get updated. Of course, that takes some time, and it’s not even two months since Unicode 11 was released. The Georgian community understood that it would take time to get implementations in place, and that they would need to take measures to smooth over that transition — which can include having Web sites for Georgian businesses and institutions using fonts to match the requirements of the content.

Peter

From: Unicore <unicore-bounces at unicode.org> On Behalf Of Markus Scherer via Unicore
Sent: Wednesday, July 18, 2018 3:05 PM
To: unicore UnicoRe Discussion <unicore at unicode.org>
Cc: mark <mark at macchiato.com>
Subject: Unicode 11 Georgian uppercase vs. fonts

Dear fellow Unicoders,

We’ve run into some significant problems with the Georgian capital letters added in Unicode 11. If you have run into them yourselves, or have feedback on our brainstormed solutions below, we’d love to hear your thoughts.

Here's the problem. The vast majority of Georgian fonts do not yet have the new uppercase characters. So when any system uses case mapping to uppercase text (e.g. browsers interpreting CSS’s text-transform: capitalize), then the users of Georgian will see boxes (“tofu”) if the font they are using does not have the glyphs.

For example, a program constructs a web page with buttons. It uses a CSS style to uppercase text in buttons, as a house style. Unless the user has a very up-to-date font, they see tofu (boxes). If a server does backend rendering, its font has to be very up-to-date. We also saw this problem in a program that was doing titlecasing, but on the first character it used the uppercase mappings rather than titlecase mappings. Not the right thing to do, of course, but code that accidentally works (most of the time) doesn't get fixed if nobody reports a bug about it.

All of these will result in bad bugs in the UI, in software that formerly worked fine.

We brainstormed some options to fix this:

  1.  Get all call sites to change their code to not uppercase Georgian (and fix titlecasing to use the titlecase mappings, not the uppercase mappings). Since we have no control over call sites and release cycles of affected software, this would not help Georgian users for a long time, if ever. We’d eventually want to retract these changes, creating even more work.
  2.  Change all common fonts with Georgian characters to add the U11.0 ones. This should eventually happen but would probably take a couple of years at least, which does not help users in the short term.
  3.  Hack font CMAPs to just map the new characters to the glyphs of the old ones. Works but only when a programmer can control the fonts used, such as with server-side rendering or downloadable fonts.
  4.  Remove the uppercase mappings for Georgian, until the fonts catch up.

     *   Would at least have to be done in all browsers, otherwise web apps will still break for Georgian.
     *   A broader alternative is to do it in ICU. Because that is used by the majority of the browser implementations, it would solve the short-term problem for the browsers — and many other programs. Drawback: Non-conformant, and uppercasing will be inconsistent depending on who has which variant of ICU (with vs. without hack, on top of: with Unicode 11 vs. before Unicode 11).

        *   One precedent is that in CLDR we deliberately hold back from using new currency characters until the font support is sufficiently widespread. (Wishing we'd held back the uppercase mappings in Unicode 11.0 too!)

Mark & Markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180720/8d1438c1/attachment.html>