German sharp S uppercase mapping

Dominikus Dittes Scherkl lyratelle at gmx.de
Mon Dec 2 04:33:25 CST 2024


Am 02.12.24 um 07:13 schrieb Asmus Freytag via Unicode:
> On 12/1/2024 9:09 PM, David Starner via Unicode wrote:
>> On Sun, Dec 1, 2024 at 7:54 PM Dominikus Dittes Scherkl via Unicode
>> <unicode at corp.unicode.org> wrote:
>>> But in automatic text processing the old form is simply a bug that needs
>>> to be fixed. The new form has to be the "default" - otherwise
>>> implementations will proliferate this bug forever.
>> Various systems take for granted that case folding is stable.
But that is the problem with the old casing: IT IS NOT STABLE!
toLower(toUpper("ß"))=="ss" - this is simply wrong, no matter which
language or locale you are using (beside the fact that is is nowhere
used except in the german languages). This is the reason why the new "ẞ"
was invented - to allow roundtrip without modifying the text!

> Very much agreed on that one. Usually in the context of "identifiers"
> and not in free text.
Especially for security reasons, the casing should be changed - to not
lose the "ß" in your name and therefore beeing considered a different
person IN YOUR LEGAL DOCUMENTS - the most important identifier of all!

>> Differences in how Unicode data is interpreted has open security holes
>> in systems, and while this isn't particularly likely with this change,
>> it is possible, which is part of the reason case-folding is guaranteed
>> to be stable. Such a change can confuse case-insensitive filesystems,
Beside the fact that case-insensitive filesystems are a pain in the ass,
especially there it is necessary to not lose the information wether
something contained a "ß" or a "ss" - which with the old casing was not
possible.

>> or change the interpretation of code in case-insensitive filesystems.
>> The automated default isn't going to change, and German is going to
>> have to join Turkish in that purely default case-conversion just
>> doesn't work for them.
Unlike turkish, which has a different uppercase for "i" - which is used
differently in pretty much _any_ other latin-script using language, "ß"
is not used differently in any other language. It is not used in any
other language at all.

> By "default", if I start editing a document, I should not have to worry
> about getting a deficient case mapping/case conversion implementation
> just because I'm using the "wrong" language.
Correct. This is why the case mapping should be changed _for all_
languages and locales. The default should be changed. Noone should be
using the old casing, except if he specially tailors his system to use it.

> Likewise, by default, I should never get the locale-dependent case
> conversion invoked when accessing file systems or domain names.
Correct. But with the old mapping, the system will unwanted change my
name from "Heß" to "HESS" - and that cannot be undone if I start using a
case-sensitive filesystem, unless I know that it is wrong and change it
back manually. The new mapping is there to fix that. So please, start
using it! NOW.




More information about the Unicode mailing list