Fwd: Re: [private] German sharp S uppercase mapping

Steffen Nurpmeso steffen at sdaoden.eu
Mon Dec 2 16:22:34 CST 2024


Several Reply-To: follows today..

--- Forwarded from Steffen Nurpmeso <steffen at sdaoden.eu> ---
Date: Mon, 02 Dec 2024 20:52:08 +0100
Author: Steffen Nurpmeso <steffen at sdaoden.eu>
From: Steffen Nurpmeso <steffen at sdaoden.eu>
To: Doug Ewell <doug at ewellic.org>
Subject: Re: [private] German sharp S uppercase mapping
Message-ID: <20241202195208.puaParvJ at steffen%sdaoden.eu>

Doug Ewell wrote in
 <SJ0PR03MB659877D11B14CD8301FA25BFCA342 at SJ0PR03MB6598.namprd03.prod.outl\
 ook.com>:
 |Steffen Nurpmeso wrote:
 |
 |>|Casing for text meant for human readers should follow current local
 |>|conventions.
 |>|
 |>|Casing for text meant for machine processing (file systems,
 |>|databases, etc.) must remain stable, even when local conventions
 |>|change.
 |>
 |> Sorry that makes totally no sense to me.
 |
 |I am guessing you haven’t had to provide support for systems (computer \
 |or otherwise) which depend on standards that are not stable, or which \
 |introduce their own instability.

Sure, i use ISO C (ha!), not to mention IDNA 2003/8.

 |When your internal database lookup function expects the uppercase form \
 |of „schließen” to be „SCHLIESSEN”, and one day the user-level function \
 |fails because the internal lookup now expects „SCHLIEẞEN”, it won’t \
 |matter much that the internal function is more correct.

Sounds like bad design really.  Ok ok that sounds fat now, but
really i have a hard time transposing your words to real life
software.  You know, and that is *so* bad in real life (i actually
drowned in examples, some of which i produced myself).
Unicode has stability, U+00DF is small and U+1E9E is uppercase.
The issue is old it seems:

  # (cd /x/doc/coding/charset-plus/data/; grep -ri 1E9E)
[hand selected lines]
  auxiliary/SentenceBreakProperty.txt:1E9E          ; Upper # L&       LATIN CAPITAL LETTER SHARP S
  extracted/DerivedName.txt:1E9E          ; LATIN CAPITAL LETTER SHARP S
  extracted/DerivedGeneralCategory.txt:1E9E          ; Lu #       LATIN CAPITAL LETTER SHARP S
  CaseFolding.txt:1E9E; F; 0073 0073; # LATIN CAPITAL LETTER SHARP S
  CaseFolding.txt:1E9E; S; 00DF; # LATIN CAPITAL LETTER SHARP S
  DerivedCoreProperties.txt:1E9E          ; Uppercase # L&       LATIN CAPITAL LETTER SHARP S
  DerivedCoreProperties.txt:1E9E          ; Changes_When_Lowercased # L&       LATIN CAPITAL LETTER SHARP S
^ here
  DerivedCoreProperties.txt:1E9E          ; Changes_When_Casefolded # L&       LATIN CAPITAL LETTER SHARP S
^ here
  DerivedCoreProperties.txt:1E9E          ; Changes_When_Casemapped # L&       LATIN CAPITAL LETTER SHARP S
^ here
  DerivedNormalizationProps.txt:1E9E          ; NFKC_CF; 0073 0073      # L&       LATIN CAPITAL LETTER SHARP S
  DerivedNormalizationProps.txt:1E9E          ; Changes_When_NFKC_Casefolded # L&       LATIN CAPITAL LETTER SHARP S
  NamesList.txt:  * uppercase is "SS" or 1E9E
  NamesList.txt:1E9E      LATIN CAPITAL LETTER SHARP S
  UnicodeData.txt:1E9E;LATIN CAPITAL LETTER SHARP S;Lu;0;L;;;;;N;;;;00DF;

So a complete implementation dealing with Unicode always had to
deal with this issue.  Even my s-ctext which i started by the end
of March 2013 and practically stopped in October 2013 due to a CVE
to a codebase i maintain, without having been informed on it, that
is, but to which i will hopefully come back at a later time, knew
about that already.

  #?0|kent:.s-ctext.git$ git grep -i Changes_When_Lowercase  master|wc -l
  573
  #?0|kent:.s-ctext.git$ git grep -i Changes_When_Lowercase  master|tail -1
  master:tools/ucd-props.h:# define sct_Changes_When_Lowercased         (1ull<<47)

(I want to point out that the header comments
  /* Aiieeh, we cannot use enum due to datatype restrictions <-> portability */)
 -- End forward <20241202195208.puaParvJ at steffen%sdaoden.eu>

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)
|
|And in Fall, feel "The Dropbear Bard"s ball(s).
|
|The banded bear
|without a care,
|Banged on himself for e'er and e'er
|
|Farewell, dear collar bear



More information about the Unicode mailing list