German sharp S uppercase mapping

Asmus Freytag asmusf at ix.netcom.com
Mon Dec 2 00:08:12 CST 2024


On 12/1/2024 5:48 PM, Dominikus Dittes Scherkl via Unicode wrote:
> Am 30.11.24 um 18:16 schrieb Asmus Freytag via Unicode:
>> On 11/27/2024 12:15 PM, Dominikus Dittes Scherkl via Unicode wrote:
>> However, speaking of this as a "default" is confusing to readers who
>> think in terms of text processing or authoring environments where a
>> different set of requirements rule. Here, the proper "default" is the
>> best implementation of a culturally appropriate case transform.
>
> NO. I really mean "default" in a technical sense, not something someone
> tailors to local needs.
> The ẞ was introduced to have an invertible casing, just like
> compatibility codepoints were assigned to make preservation of old
> formating information available if a translation back to some obsolete
> charset is necessary.
>
> _This new letter was invented to allow for 1:1 roundtrip conversion._

The letter was not *invented*. It was discovered (= identified as 
occurring in actual writing) and encoded.

It was encoded to match a character with a unique shape and properties. 
One of them of *being* a capital letter and the other one of ß being its 
lowercase equivalent.

>
> toUpper() shall change "ß" to "ẞ" instead of "SS", just to allow
> toLower() producing back "ß" instead of a wrong spelling with "ss"
> (which at the moment can only be avoided using a german dictionary - a
> really heavy constraint to a small function like toLower - and for
> family names simply not possible at all - the information is lost).

Your problem is that you assume an implementation of toUpper that takes 
no argument. For purposes like text design, publication etc. you want an 
implementation that selects which locale should set the rules. (Or one, 
where that setting is done behind the scenes, which is logically 
equivalent). Without specifiying the locale, your beautiful toUpper() 
does not now that in Turkish, 'i' is not mapped to 'I' but to CAPITAL I 
WITH DOT.

Because your beautiful toUpper does not handle at least one language 
means that it should not need to handle any languages. Instead it should 
be stable.

What you are describing is a change to the toUpper() that is invoked 
with the german locale as parameter (or selected behind the scenes).

There's not the same requirement for that one to be stable, although 
sometimes transitions are implemented by creating a separate locale for 
"old" and "new" orthographies and the like.

When it comes to case conversion, purpose matters.

This doesn't detract from the need to have implementations that do the 
"right" thing (as currently defined) for a given language. And from the 
need to enable these by default for ordinary text manipulation.

But it's not the same thing as overriding an "identifier-safe" or 
"filesystem-safe" implementation, just because that's incorrectly viewed 
as a "default" that should be applicable to text manipulation.

A./

>
> This is a really bad situation, which should be fixed as soon as
> possible, not a matter of taste.
> And it should be fixed explicitly in automatic text processing - because
> this is were today errors are produced, that can now be avoided.
> In private letters it doesn't matter what form is used - the people
> write whatever they want anyway. But automatic processing shall not drop
> information that can not be brought back (expcept with re-introducing
> this knowledge back manually).
>
>> And what is "best"  can change over time.
> No. Fixing this round-trip bug is in the best interest of unicode and
> that won't change over time. Using "SS" in all uppercase text was always
> a bad workaround that became a source of spelling errors by automatic
> text processing and for which a fix was invented some ten years ago. So
> lets use it everywhere - at least now that it is officially allowed
> (since 2017) and even preferred (since this year).
>
>



More information about the Unicode mailing list