Unicode 11 Georgian uppercase vs. fonts

Alexey Ostrovsky via Unicode unicode at unicode.org
Thu Jul 26 14:27:08 CDT 2018


Hi there!

"The Georgian community understood" — sorry, but here "the Georgian
community" means a small group of Georgian font designers who promote
upper-case for effectively caseless Georgian. Many Georgian scientists
working with script and language are not fans of "uppercase" font styles.
Option #2 as well as any other forcible upper-case option for Georgian is
an error (it can be compared with forcible black-letter option for, say,
Cyrillic through a CSS attribute).

Well, doesn't matter, what about options. Actually, the problem must be
split in two issues:
a) Whether to capitalize a Georgian text in the same case when we
capitalize a Latin one.
b) How to handle cases when we transform the text and there is no capital
characters for Georgian.

Before answering, we must mention the caseless nature of the Georgian
script. It "capital" letters do not exists as letters, they are letter
variants used exactly the same way as the Latin title case. Therefore,
Georgian "uppercase" = Georgian title case = Georgian "capital letters" in
Unicode 11, it is far from Latin uppercase by its behavior and its
features. Here are some examples for Georgian (I use English, but semantics
and casing mean to reflect Georgian) to understand where we are:
-- "mr. john smith" is unconditionally OK;
-- "MR JOHN SMITH" or "mr JOHN SMITH" can be OK or wrong depending on
situation, usually it is OK;
-- "Mr John Smith" is unconditionally wrong (except some marginal cases,
similar to English "mR jOHN sMITH").
Therefore, easiest answer is (b): leave it "minuscule", as it is an
excellent and fully readable default solution. An answer to (a) is not that
easy, as it depends on designer's mood etc. I would say the designer has to
have an option to control it (say, through "important" CSS option), and the
default behavior must to be to ignore uppercase transformations for
Georgian. (If one accepts it by default, there are cases like [<span
class="x">m</span>r  <span class="x">j</span>ohn <span
class="x">s</span>mith]).

Based on above, the answers to the initial questions are:

*1) Get all call sites to change their code to not uppercase Georgian (and
fix titlecasing to use the titlecase mappings, not the uppercase
mappings). *
This requires John Smith to have a special knowledge How to deal with
Georgian. But, anyway, it is a good behavior.

*> 2) Change all common fonts with Georgian characters to add the U11.0
ones. *
This does not address the issues like 'John Smith" and appropriate usage of
Georgian fonts. Capitalization rules can vary and some options may be
inappropriate for a caseless script, as Georgian.

*> 3) Hack font CMAPs to just map the new characters to the glyphs of the
old ones. *
This is the best behavior, but the solution is not that good.

The best solution would be a special treatment of Georgian uppercase in CSS
and on OS level (I know that is bad, but Unicode 11 is already released and
it was already approved).

Sincerely,
Alex.

P.S. Adding uppercase for Georgian was a mistake (in my opinion, of
course), as it violates the Unicode principle to encode characters.

On Fri, Jul 20, 2018 at 11:21 AM, Peter Constable via Unicode <
unicode at unicode.org> wrote:

> IMO, the correct answer is 2, except that “all common fonts” is more
> sweeping that necessary: it’s sufficient to have fonts used for fallback in
> platforms and browsers, and the related fallback logic, to get updated. Of
> course, that takes some time, and it’s not even two months since Unicode 11
> was released. The Georgian community understood that it would take time to
> get implementations in place, and that they would need to take measures to
> smooth over that transition — which can include having Web sites for
> Georgian businesses and institutions using fonts to match the requirements
> of the content.
>
>
>
>
>
> Peter
>
>
>
> *From:* Unicore <unicore-bounces at unicode.org> *On Behalf Of *Markus
> Scherer via Unicore
> *Sent:* Wednesday, July 18, 2018 3:05 PM
> *To:* unicore UnicoRe Discussion <unicore at unicode.org>
> *Cc:* mark <mark at macchiato.com>
> *Subject:* Unicode 11 Georgian uppercase vs. fonts
>
>
>
> Dear fellow Unicoders,
>
>
>
> We’ve run into some significant problems with the Georgian capital letters
> added in Unicode 11. If you have run into them yourselves, or have feedback
> on our brainstormed solutions below, we’d love to hear your thoughts.
>
>
>
> Here's the problem. The vast majority of Georgian fonts do not yet have
> the new uppercase characters. So when any system uses case mapping to
> uppercase text (e.g. browsers interpreting CSS’s text-transform:
> capitalize), then the users of Georgian will see boxes (“tofu”) if the font
> they are using does not have the glyphs.
>
>
>
> For example, a program constructs a web page with buttons. It uses a CSS
> style to uppercase text in buttons, as a house style. Unless the user has a
> very up-to-date font, they see tofu (boxes). If a server does backend
> rendering, its font has to be very up-to-date. We also saw this problem in
> a program that was doing titlecasing, but on the first character it used
> the *uppercase* mappings rather than *titlecase* mappings. Not the right
> thing to do, of course, but code that accidentally works (most of the time)
> doesn't get fixed if nobody reports a bug about it.
>
>
>
> All of these will result in bad bugs in the UI, in software that formerly
> worked fine.
>
>
>
> We brainstormed some options to fix this:
>
>
>
>    1. Get all call sites to change their code to *not* uppercase Georgian
>    (and fix titlecasing to use the titlecase mappings, not the uppercase
>    mappings). Since we have no control over call sites and release cycles of
>    affected software, this would not help Georgian users for a long time, if
>    ever. We’d eventually want to retract these changes, creating even more
>    work.
>    2. Change all common fonts with Georgian characters to add the U11.0
>    ones. This should eventually happen but would probably take a couple of
>    years at least, which does not help users in the short term.
>    3. Hack font CMAPs to just map the new characters to the glyphs of the
>    old ones. Works but only when a programmer can control the fonts used, such
>    as with server-side rendering or downloadable fonts.
>    4. Remove the uppercase mappings for Georgian, until the fonts catch
>    up.
>
>
>    1. Would at least have to be done in all browsers, otherwise web apps
>       will still break for Georgian.
>       2. A broader alternative is to do it in ICU. Because that is used
>       by the majority of the browser implementations, it would solve the
>       short-term problem for the browsers — and many other programs. Drawback:
>       Non-conformant, and uppercasing will be inconsistent depending on who has
>       which variant of ICU (with vs. without hack, on top of: with Unicode 11 vs.
>       before Unicode 11).
>
>
>    1. One precedent is that in CLDR we deliberately hold back from using
>          new currency characters until the font support is sufficiently widespread.
>          (Wishing we'd held back the uppercase mappings in Unicode 11.0 too!)
>
>
>
> Mark & Markus
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20180726/003c3c60/attachment.html>


More information about the Unicode mailing list