German sharp S uppercase mapping

Daniel Buncic daniel.buncic at uni-koeln.de
Tue Nov 26 03:43:44 CST 2024


Dear Jules, dear all,

Thank you for bringing this up.  As a German linguist and main editor of 
two journals, I would like to give my opinion on this.

First of all, for the context.  German 〈ß〉 never occurs at the beginning 
of words.  Consequently, the case mapping is necessary exclusively for 
all-caps and small caps contexts, not for sentence case (or title case, 
which is not used in German anyway).  Originally, 〈ß〉 emerged in 
‘German’ variants of the Latin alphabet, blackletter and Kurrent. 
Although the exact origin is still unclear, it was always interpreted as 
a ligature of either 〈ſz〉 (or rather 〈ſʒ〉) or 〈ſs〉.  On the one hand, 
the 〈ß〉 was also used in other languages like Polish or Hungarian (for 
modern 〈sz〉) and even, in roman type (antiqua), for Latin or French 〈ss〉 
(cf. https://en.wikipedia.org/wiki/ß#Use_in_Roman_type).  On the other 
hand, traditionally, roman type did not have the letter 〈ß〉, which is 
why it has a long tradition of being replaced with 〈ss〉.

This is why in lots of surnames, the spellings 〈ß〉 and 〈ss〉 exist side 
by side.  However, names like Geßler and Gessler, Meißner and Meissner, 
Weiß and Weiss, Voß and Voss, although historically having 
differentiated through chance spellings at times when the spelling was 
not fixed yet, nowadays have a fixed spelling and are therefore regarded 
as being different names, with their bearers insisting on the correct 
spelling of their name as 〈Heß〉, not 〈Hess〉, etc.

Therefore it is important to keep 〈ß〉 and 〈ss〉 apart even in all-caps or 
small caps (e.g. in forms that have to be filled out in capital letters, 
in headings that only have capitals, etc.).  Until 1998, the 
then-normative dictionary, Duden, recommended generally using 〈SS〉 for 
〈ß〉 but 〈SZ〉 for 〈ß〉 wherever the difference was important, giving the 
example of 〈in Massen〉 /ʔɪnˈmasən/ ‘en masse’ vs. 〈in Maßen〉 
/ʔɪnˈmaːsən/ ‘in moderation’, where the latter would have to be 
capitalized as 〈IN MASZEN〉.  With proper names, however, this method 
does not work, because many names also have a variant with 〈sz〉 (which 
in some cases occurred through a different interpretation of 〈ſʒ〉 in the 
old documents, in some cases through Hungarian mediation). See 
https://de.wikipedia.org/wiki/Geszler vs. 
https://de.wikipedia.org/wiki/Geßler, 
https://de.wikipedia.org/wiki/Meiszner vs. 
https://de.wikipedia.org/wiki/Meißner_(Familienname), 
https://de.wikipedia.org/wiki/Weisz vs. 
https://de.wikipedia.org/wiki/Weiß_(Familienname), 
https://de.wikipedia.org/wiki/Vosz vs. 
https://de.wikipedia.org/wiki/Voß, https://de.wikipedia.org/wiki/Hesz 
vs. https://de.wikipedia.org/wiki/Heß.  This 〈SZ〉 rule (which was never 
implemented in Unicode) was deservedly abolished in the spelling reform 
of 1998.

For a long time, the practice therefore was that people printed their 
name in forms as WEIß, VOß, GEßLER, etc., which was often misread as 
WEIB, VOB, GEBLER, etc.  The CAPS LOCK key on German computer keyboards 
had no effect on 〈ß〉 and word processors also left ß unchanged when 
changing a text to all-caps or small caps; in order to achieve the 
desired result 〈FUSSBALL〉 ‘soccer’ or 〈WEISSHAUSSTRASSE〉 (literally 
‘Whitehouse Street’, a streetcar stop in Cologne, which is definitely 
lacking in readability in this all-caps version with 3×〈SS〉), you had to 
change 〈Fußball〉 to 〈Fussball〉 or 〈Weißhausstraße〉 to 〈Weisshausstrasse〉 
manually, which many people did not do, so 〈FUßBALL〉 or 〈WEIßHAUSSTRAßE〉 
were non-normative spellings seen extremely often, even in texts printed 
by renowned publishers or on official signs.

The solution for all this mess was the introduction of capital 〈ẞ〉, 
which enables us to spell GEẞLER, MEIẞNER, WEIẞ, VOẞ, HEẞ, FUẞBALL, 
WEIẞHAUSSTRAẞE, etc. in all-caps (or small caps) with an unambiguous 
capital version of 〈ß〉.  On contemporary computer keyboards, the CAPS 
LOCK key produces capital 〈ẞ〉 when the 〈ß〉 key is pushed.  (Note that 
this is completely sufficient and that no separate capital 〈ẞ〉 key is 
needed because the capital letter never has to be typed outside all-caps 
environments, so SHIFT + 〈ß〉 can continue to produce the question mark.)

The council regulating official German orthography, which consists of 
experts from Germany, Austria, Switzerland, Liechtenstein, northern 
Italy and eastern Belgium, admitted the spelling with capital 〈ẞ〉 in 
2017 and made it the standard form, with 〈SS〉 to be used only where 
capital 〈ẞ〉 is not available, on 15 December 2023 (see 
https://grammis.ids-mannheim.de/rechtschreibung/6180).

The practice is also changing rapidly (and this is why the orthography 
council made this decision, because they have a very descriptive rather 
than prescriptive policy).  Although many fonts still do not contain 
capital 〈ẞ〉 and some word processors have not updated their 
capitalization algorithms yet (maybe because they rely on Unicode?), you 
see capital 〈ẞ〉 used more and more frequently, and it is eagerly taken 
up by people called Weiß, Voß, Geßler, etc.  The ambiguity of 〈WEISS〉, 
〈VOSS〉, 〈GESSLER〉, etc., which leads to a real loss of information, 
should be reason enough to change the capitalization rule of Unicode 
from ß → SS to ß → ẞ, i.e. to change the relevant line in 
https://www.unicode.org/Public/16.0.0/ucd/CaseFolding.txt from
     00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S
     (actually, why isn’t it 0053 0053?)
to
     00DF; C; 1E9E; # LATIN SMALL LETTER SHARP S,
delete the line
     1E9E; F; 0073 0073; # LATIN CAPITAL LETTER SHARP S,
change “S” to “C” in the line
     1E9E; S; 00DF; # LATIN CAPITAL LETTER SHARP S,
and then delete the treatment of 〈ß〉 in 
https://www.unicode.org/Public/16.0.0/ucd/SpecialCasing.txt because 〈ß〉 
does not need special treatment anymore.

As to the case pair stability guarantee, I would like to stress that 〈ß〉 
is used in no other modern language than German, and if case folding 
changes in that language, then Unicode has to adjust.  It would be poor 
service to the public to stick to a case mapping that is no longer valid 
just because Unicode came into existence at a time when it was still 
valid.  And since capital 〈ẞ〉 has existed in Unicode since 2008, the use 
of 〈SS〉 instead of it in a string of Unicode characters has to be 
regarded as no longer valid (although workarounds with 〈SS〉 for fonts 
that do not contain capital 〈ẞ〉 are not regarded as misspellings).

Capital 〈ẞ〉 is officially the normative capital letter corresponding to 
〈ß〉 now.  People use it wherever they can.  There is not reason not to 
use it except insufficient technology (or knowledge).  So let’s update 
the technology.  Let’s update the Unicode standard.

Best wishes,

Daniel

-- 
Prof. Dr. Daniel Bunčić
===============================================================
Slavisches Institut der Universität zu Köln
Weyertal 137, D-50931 Köln
Telefon:       +49 (0)221  470-90535
Sprechstunden: https://uni.koeln/ENZEB
E-Mail:        daniel.buncic at uni-koeln.de = daniel at buncic.de
Threema:       https://threema.id/8M375R5K
===============================================================
Homepage:      http://daniel.buncic.de/
Academia:      http://uni-koeln.academia.edu/buncic
ResearchGate:  https://researchgate.net/profile/Daniel-Buncic-2
===============================================================


More information about the Unicode mailing list