Fwd: Re: [private] German sharp S uppercase mapping

Mon Dec 2 16:22:34 CST 2024

Several Reply-To: follows today..

--- Forwarded from Steffen Nurpmeso <steffen at sdaoden.eu> ---
Date: Mon, 02 Dec 2024 20:52:08 +0100
Author: Steffen Nurpmeso <steffen at sdaoden.eu>
From: Steffen Nurpmeso <steffen at sdaoden.eu>
To: Doug Ewell <doug at ewellic.org>
Subject: Re: [private] German sharp S uppercase mapping
Message-ID: <20241202195208.puaParvJ at steffen%sdaoden.eu>

Sure, i use ISO C (ha!), not to mention IDNA 2003/8.

 |When your internal database lookup function expects the uppercase form \
 |of „schließen” to be „SCHLIESSEN”, and one day the user-level function \
 |fails because the internal lookup now expects „SCHLIEẞEN”, it won’t \
 |matter much that the internal function is more correct.

Sounds like bad design really.  Ok ok that sounds fat now, but
really i have a hard time transposing your words to real life
software.  You know, and that is *so* bad in real life (i actually
drowned in examples, some of which i produced myself).
Unicode has stability, U+00DF is small and U+1E9E is uppercase.
The issue is old it seems:

  # (cd /x/doc/coding/charset-plus/data/; grep -ri 1E9E)
[hand selected lines]
  auxiliary/SentenceBreakProperty.txt:1E9E          ; Upper # L&       LATIN CAPITAL LETTER SHARP S
  extracted/DerivedName.txt:1E9E          ; LATIN CAPITAL LETTER SHARP S
  extracted/DerivedGeneralCategory.txt:1E9E          ; Lu #       LATIN CAPITAL LETTER SHARP S
  CaseFolding.txt:1E9E; F; 0073 0073; # LATIN CAPITAL LETTER SHARP S
  CaseFolding.txt:1E9E; S; 00DF; # LATIN CAPITAL LETTER SHARP S
  DerivedCoreProperties.txt:1E9E          ; Uppercase # L&       LATIN CAPITAL LETTER SHARP S
  DerivedCoreProperties.txt:1E9E          ; Changes_When_Lowercased # L&       LATIN CAPITAL LETTER SHARP S
^ here
  DerivedCoreProperties.txt:1E9E          ; Changes_When_Casefolded # L&       LATIN CAPITAL LETTER SHARP S
^ here
  DerivedCoreProperties.txt:1E9E          ; Changes_When_Casemapped # L&       LATIN CAPITAL LETTER SHARP S
^ here
  DerivedNormalizationProps.txt:1E9E          ; NFKC_CF; 0073 0073      # L&       LATIN CAPITAL LETTER SHARP S
  DerivedNormalizationProps.txt:1E9E          ; Changes_When_NFKC_Casefolded # L&       LATIN CAPITAL LETTER SHARP S
  NamesList.txt:  * uppercase is "SS" or 1E9E
  NamesList.txt:1E9E      LATIN CAPITAL LETTER SHARP S
  UnicodeData.txt:1E9E;LATIN CAPITAL LETTER SHARP S;Lu;0;L;;;;;N;;;;00DF;

So a complete implementation dealing with Unicode always had to
deal with this issue.  Even my s-ctext which i started by the end
of March 2013 and practically stopped in October 2013 due to a CVE
to a codebase i maintain, without having been informed on it, that
is, but to which i will hopefully come back at a later time, knew
about that already.

  #?0|kent:.s-ctext.git$ git grep -i Changes_When_Lowercase  master|wc -l
  573
  #?0|kent:.s-ctext.git$ git grep -i Changes_When_Lowercase  master|tail -1
  master:tools/ucd-props.h:# define sct_Changes_When_Lowercased         (1ull<<47)

(I want to point out that the header comments
  /* Aiieeh, we cannot use enum due to datatype restrictions <-> portability */)
 -- End forward <20241202195208.puaParvJ at steffen%sdaoden.eu>

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
|
|And in Fall, feel "The Dropbear Bard"s ball(s).
|
|The banded bear
|without a care,
|Banged on himself for e'er and e'er
|
|Farewell, dear collar bear