Fwd: Re: [private] German sharp S uppercase mapping
Steffen Nurpmeso
steffen at sdaoden.eu
Mon Dec 2 16:22:34 CST 2024
Several Reply-To: follows today..
--- Forwarded from Steffen Nurpmeso <steffen at sdaoden.eu> ---
Date: Mon, 02 Dec 2024 20:52:08 +0100
Author: Steffen Nurpmeso <steffen at sdaoden.eu>
From: Steffen Nurpmeso <steffen at sdaoden.eu>
To: Doug Ewell <doug at ewellic.org>
Subject: Re: [private] German sharp S uppercase mapping
Message-ID: <20241202195208.puaParvJ at steffen%sdaoden.eu>
Doug Ewell wrote in
<SJ0PR03MB659877D11B14CD8301FA25BFCA342 at SJ0PR03MB6598.namprd03.prod.outl\
ook.com>:
|Steffen Nurpmeso wrote:
|
|>|Casing for text meant for human readers should follow current local
|>|conventions.
|>|
|>|Casing for text meant for machine processing (file systems,
|>|databases, etc.) must remain stable, even when local conventions
|>|change.
|>
|> Sorry that makes totally no sense to me.
|
|I am guessing you haven’t had to provide support for systems (computer \
|or otherwise) which depend on standards that are not stable, or which \
|introduce their own instability.
Sure, i use ISO C (ha!), not to mention IDNA 2003/8.
|When your internal database lookup function expects the uppercase form \
|of „schließen” to be „SCHLIESSEN”, and one day the user-level function \
|fails because the internal lookup now expects „SCHLIEẞEN”, it won’t \
|matter much that the internal function is more correct.
Sounds like bad design really. Ok ok that sounds fat now, but
really i have a hard time transposing your words to real life
software. You know, and that is *so* bad in real life (i actually
drowned in examples, some of which i produced myself).
Unicode has stability, U+00DF is small and U+1E9E is uppercase.
The issue is old it seems:
# (cd /x/doc/coding/charset-plus/data/; grep -ri 1E9E)
[hand selected lines]
auxiliary/SentenceBreakProperty.txt:1E9E ; Upper # L& LATIN CAPITAL LETTER SHARP S
extracted/DerivedName.txt:1E9E ; LATIN CAPITAL LETTER SHARP S
extracted/DerivedGeneralCategory.txt:1E9E ; Lu # LATIN CAPITAL LETTER SHARP S
CaseFolding.txt:1E9E; F; 0073 0073; # LATIN CAPITAL LETTER SHARP S
CaseFolding.txt:1E9E; S; 00DF; # LATIN CAPITAL LETTER SHARP S
DerivedCoreProperties.txt:1E9E ; Uppercase # L& LATIN CAPITAL LETTER SHARP S
DerivedCoreProperties.txt:1E9E ; Changes_When_Lowercased # L& LATIN CAPITAL LETTER SHARP S
^ here
DerivedCoreProperties.txt:1E9E ; Changes_When_Casefolded # L& LATIN CAPITAL LETTER SHARP S
^ here
DerivedCoreProperties.txt:1E9E ; Changes_When_Casemapped # L& LATIN CAPITAL LETTER SHARP S
^ here
DerivedNormalizationProps.txt:1E9E ; NFKC_CF; 0073 0073 # L& LATIN CAPITAL LETTER SHARP S
DerivedNormalizationProps.txt:1E9E ; Changes_When_NFKC_Casefolded # L& LATIN CAPITAL LETTER SHARP S
NamesList.txt: * uppercase is "SS" or 1E9E
NamesList.txt:1E9E LATIN CAPITAL LETTER SHARP S
UnicodeData.txt:1E9E;LATIN CAPITAL LETTER SHARP S;Lu;0;L;;;;;N;;;;00DF;
So a complete implementation dealing with Unicode always had to
deal with this issue. Even my s-ctext which i started by the end
of March 2013 and practically stopped in October 2013 due to a CVE
to a codebase i maintain, without having been informed on it, that
is, but to which i will hopefully come back at a later time, knew
about that already.
#?0|kent:.s-ctext.git$ git grep -i Changes_When_Lowercase master|wc -l
573
#?0|kent:.s-ctext.git$ git grep -i Changes_When_Lowercase master|tail -1
master:tools/ucd-props.h:# define sct_Changes_When_Lowercased (1ull<<47)
(I want to point out that the header comments
/* Aiieeh, we cannot use enum due to datatype restrictions <-> portability */)
-- End forward <20241202195208.puaParvJ at steffen%sdaoden.eu>
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)
|
|And in Fall, feel "The Dropbear Bard"s ball(s).
|
|The banded bear
|without a care,
|Banged on himself for e'er and e'er
|
|Farewell, dear collar bear
More information about the Unicode
mailing list