German sharp S uppercase mapping

Dominikus Dittes Scherkl lyratelle at gmx.de
Sun Dec 1 19:48:18 CST 2024


Am 30.11.24 um 18:16 schrieb Asmus Freytag via Unicode:
> On 11/27/2024 12:15 PM, Dominikus Dittes Scherkl via Unicode wrote:
> However, speaking of this as a "default" is confusing to readers who
> think in terms of text processing or authoring environments where a
> different set of requirements rule. Here, the proper "default" is the
> best implementation of a culturally appropriate case transform.

NO. I really mean "default" in a technical sense, not something someone
tailors to local needs.
The ẞ was introduced to have an invertible casing, just like
compatibility codepoints were assigned to make preservation of old
formating information available if a translation back to some obsolete
charset is necessary.

_This new letter was invented to allow for 1:1 roundtrip conversion._

toUpper() shall change "ß" to "ẞ" instead of "SS", just to allow
toLower() producing back "ß" instead of a wrong spelling with "ss"
(which at the moment can only be avoided using a german dictionary - a
really heavy constraint to a small function like toLower - and for
family names simply not possible at all - the information is lost).

This is a really bad situation, which should be fixed as soon as
possible, not a matter of taste.
And it should be fixed explicitly in automatic text processing - because
this is were today errors are produced, that can now be avoided.
In private letters it doesn't matter what form is used - the people
write whatever they want anyway. But automatic processing shall not drop
information that can not be brought back (expcept with re-introducing
this knowledge back manually).

> And what is "best"  can change over time.
No. Fixing this round-trip bug is in the best interest of unicode and
that won't change over time. Using "SS" in all uppercase text was always
a bad workaround that became a source of spelling errors by automatic
text processing and for which a fix was invented some ten years ago. So
lets use it everywhere - at least now that it is officially allowed
(since 2017) and even preferred (since this year).




More information about the Unicode mailing list