German sharp S uppercase mapping
Daniel Buncic
daniel.buncic at uni-koeln.de
Tue Nov 26 03:43:44 CST 2024
Dear Jules, dear all,
Thank you for bringing this up. As a German linguist and main editor of
two journals, I would like to give my opinion on this.
First of all, for the context. German 〈ß〉 never occurs at the beginning
of words. Consequently, the case mapping is necessary exclusively for
all-caps and small caps contexts, not for sentence case (or title case,
which is not used in German anyway). Originally, 〈ß〉 emerged in
‘German’ variants of the Latin alphabet, blackletter and Kurrent.
Although the exact origin is still unclear, it was always interpreted as
a ligature of either 〈ſz〉 (or rather 〈ſʒ〉) or 〈ſs〉. On the one hand,
the 〈ß〉 was also used in other languages like Polish or Hungarian (for
modern 〈sz〉) and even, in roman type (antiqua), for Latin or French 〈ss〉
(cf. https://en.wikipedia.org/wiki/ß#Use_in_Roman_type). On the other
hand, traditionally, roman type did not have the letter 〈ß〉, which is
why it has a long tradition of being replaced with 〈ss〉.
This is why in lots of surnames, the spellings 〈ß〉 and 〈ss〉 exist side
by side. However, names like Geßler and Gessler, Meißner and Meissner,
Weiß and Weiss, Voß and Voss, although historically having
differentiated through chance spellings at times when the spelling was
not fixed yet, nowadays have a fixed spelling and are therefore regarded
as being different names, with their bearers insisting on the correct
spelling of their name as 〈Heß〉, not 〈Hess〉, etc.
Therefore it is important to keep 〈ß〉 and 〈ss〉 apart even in all-caps or
small caps (e.g. in forms that have to be filled out in capital letters,
in headings that only have capitals, etc.). Until 1998, the
then-normative dictionary, Duden, recommended generally using 〈SS〉 for
〈ß〉 but 〈SZ〉 for 〈ß〉 wherever the difference was important, giving the
example of 〈in Massen〉 /ʔɪnˈmasən/ ‘en masse’ vs. 〈in Maßen〉
/ʔɪnˈmaːsən/ ‘in moderation’, where the latter would have to be
capitalized as 〈IN MASZEN〉. With proper names, however, this method
does not work, because many names also have a variant with 〈sz〉 (which
in some cases occurred through a different interpretation of 〈ſʒ〉 in the
old documents, in some cases through Hungarian mediation). See
https://de.wikipedia.org/wiki/Geszler vs.
https://de.wikipedia.org/wiki/Geßler,
https://de.wikipedia.org/wiki/Meiszner vs.
https://de.wikipedia.org/wiki/Meißner_(Familienname),
https://de.wikipedia.org/wiki/Weisz vs.
https://de.wikipedia.org/wiki/Weiß_(Familienname),
https://de.wikipedia.org/wiki/Vosz vs.
https://de.wikipedia.org/wiki/Voß, https://de.wikipedia.org/wiki/Hesz
vs. https://de.wikipedia.org/wiki/Heß. This 〈SZ〉 rule (which was never
implemented in Unicode) was deservedly abolished in the spelling reform
of 1998.
For a long time, the practice therefore was that people printed their
name in forms as WEIß, VOß, GEßLER, etc., which was often misread as
WEIB, VOB, GEBLER, etc. The CAPS LOCK key on German computer keyboards
had no effect on 〈ß〉 and word processors also left ß unchanged when
changing a text to all-caps or small caps; in order to achieve the
desired result 〈FUSSBALL〉 ‘soccer’ or 〈WEISSHAUSSTRASSE〉 (literally
‘Whitehouse Street’, a streetcar stop in Cologne, which is definitely
lacking in readability in this all-caps version with 3×〈SS〉), you had to
change 〈Fußball〉 to 〈Fussball〉 or 〈Weißhausstraße〉 to 〈Weisshausstrasse〉
manually, which many people did not do, so 〈FUßBALL〉 or 〈WEIßHAUSSTRAßE〉
were non-normative spellings seen extremely often, even in texts printed
by renowned publishers or on official signs.
The solution for all this mess was the introduction of capital 〈ẞ〉,
which enables us to spell GEẞLER, MEIẞNER, WEIẞ, VOẞ, HEẞ, FUẞBALL,
WEIẞHAUSSTRAẞE, etc. in all-caps (or small caps) with an unambiguous
capital version of 〈ß〉. On contemporary computer keyboards, the CAPS
LOCK key produces capital 〈ẞ〉 when the 〈ß〉 key is pushed. (Note that
this is completely sufficient and that no separate capital 〈ẞ〉 key is
needed because the capital letter never has to be typed outside all-caps
environments, so SHIFT + 〈ß〉 can continue to produce the question mark.)
The council regulating official German orthography, which consists of
experts from Germany, Austria, Switzerland, Liechtenstein, northern
Italy and eastern Belgium, admitted the spelling with capital 〈ẞ〉 in
2017 and made it the standard form, with 〈SS〉 to be used only where
capital 〈ẞ〉 is not available, on 15 December 2023 (see
https://grammis.ids-mannheim.de/rechtschreibung/6180).
The practice is also changing rapidly (and this is why the orthography
council made this decision, because they have a very descriptive rather
than prescriptive policy). Although many fonts still do not contain
capital 〈ẞ〉 and some word processors have not updated their
capitalization algorithms yet (maybe because they rely on Unicode?), you
see capital 〈ẞ〉 used more and more frequently, and it is eagerly taken
up by people called Weiß, Voß, Geßler, etc. The ambiguity of 〈WEISS〉,
〈VOSS〉, 〈GESSLER〉, etc., which leads to a real loss of information,
should be reason enough to change the capitalization rule of Unicode
from ß → SS to ß → ẞ, i.e. to change the relevant line in
https://www.unicode.org/Public/16.0.0/ucd/CaseFolding.txt from
00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S
(actually, why isn’t it 0053 0053?)
to
00DF; C; 1E9E; # LATIN SMALL LETTER SHARP S,
delete the line
1E9E; F; 0073 0073; # LATIN CAPITAL LETTER SHARP S,
change “S” to “C” in the line
1E9E; S; 00DF; # LATIN CAPITAL LETTER SHARP S,
and then delete the treatment of 〈ß〉 in
https://www.unicode.org/Public/16.0.0/ucd/SpecialCasing.txt because 〈ß〉
does not need special treatment anymore.
As to the case pair stability guarantee, I would like to stress that 〈ß〉
is used in no other modern language than German, and if case folding
changes in that language, then Unicode has to adjust. It would be poor
service to the public to stick to a case mapping that is no longer valid
just because Unicode came into existence at a time when it was still
valid. And since capital 〈ẞ〉 has existed in Unicode since 2008, the use
of 〈SS〉 instead of it in a string of Unicode characters has to be
regarded as no longer valid (although workarounds with 〈SS〉 for fonts
that do not contain capital 〈ẞ〉 are not regarded as misspellings).
Capital 〈ẞ〉 is officially the normative capital letter corresponding to
〈ß〉 now. People use it wherever they can. There is not reason not to
use it except insufficient technology (or knowledge). So let’s update
the technology. Let’s update the Unicode standard.
Best wishes,
Daniel
--
Prof. Dr. Daniel Bunčić
===============================================================
Slavisches Institut der Universität zu Köln
Weyertal 137, D-50931 Köln
Telefon: +49 (0)221 470-90535
Sprechstunden: https://uni.koeln/ENZEB
E-Mail: daniel.buncic at uni-koeln.de = daniel at buncic.de
Threema: https://threema.id/8M375R5K
===============================================================
Homepage: http://daniel.buncic.de/
Academia: http://uni-koeln.academia.edu/buncic
ResearchGate: https://researchgate.net/profile/Daniel-Buncic-2
===============================================================
More information about the Unicode
mailing list