Aw: Re: German sharp S uppercase mapping
Marius Spix
marius.spix at web.de
Mon Dec 2 04:19:32 CST 2024
That problem is not not new. The long ſ, which is only used in old Fraktur script, but not in modern Antiqua script, has the same issue. It shares its uppercase form S with the round s, which behaves differently than the Greek final Sigma ς and can appear mid-word, for example in compound words.
For example: to_lower(to_upper("Hauſtür")) returns "Haustür", which is inaccurate.
That can even make a difference, because "Werksirene" and "Werkſirene" or "Antragsteller" and "Antragſteller" have completely different meanings.
Gesendet: Montag, 2. Dezember 2024 um 02:48
Von: "Dominikus Dittes Scherkl via Unicode" <unicode at corp.unicode.org>
An: unicode at corp.unicode.org
CC: "Dominikus Dittes Scherkl" <lyratelle at gmx.de>
Betreff: Re: German sharp S uppercase mapping
Am 30.11.24 um 18:16 schrieb Asmus Freytag via Unicode:
> On 11/27/2024 12:15 PM, Dominikus Dittes Scherkl via Unicode wrote:
> However, speaking of this as a "default" is confusing to readers who
> think in terms of text processing or authoring environments where a
> different set of requirements rule. Here, the proper "default" is the
> best implementation of a culturally appropriate case transform.
NO. I really mean "default" in a technical sense, not something someone
tailors to local needs.
The ẞ was introduced to have an invertible casing, just like
compatibility codepoints were assigned to make preservation of old
formating information available if a translation back to some obsolete
charset is necessary.
_This new letter was invented to allow for 1:1 roundtrip conversion._
toUpper() shall change "ß" to "ẞ" instead of "SS", just to allow
toLower() producing back "ß" instead of a wrong spelling with "ss"
(which at the moment can only be avoided using a german dictionary - a
really heavy constraint to a small function like toLower - and for
family names simply not possible at all - the information is lost).
This is a really bad situation, which should be fixed as soon as
possible, not a matter of taste.
And it should be fixed explicitly in automatic text processing - because
this is were today errors are produced, that can now be avoided.
In private letters it doesn't matter what form is used - the people
write whatever they want anyway. But automatic processing shall not drop
information that can not be brought back (expcept with re-introducing
this knowledge back manually).
> And what is "best" can change over time.
No. Fixing this round-trip bug is in the best interest of unicode and
that won't change over time. Using "SS" in all uppercase text was always
a bad workaround that became a source of spelling errors by automatic
text processing and for which a fix was invented some ten years ago. So
lets use it everywhere - at least now that it is officially allowed
(since 2017) and even preferred (since this year).
</lyratelle at gmx.de></unicode at corp.unicode.org>
More information about the Unicode
mailing list