Re: Proposal for German capital letter "ß"

Philippe Verdy verdy_p at wanadoo.fr
Wed Dec 9 16:21:12 CST 2015


2015-12-09 22:45 GMT+01:00 Richard Wordingham <
richard.wordingham at ntlworld.com>:

> On Wed, 9 Dec 2015 19:55:24 +0000
> Hans Meiser <brille1 at hotmail.com> wrote:
>
> > I see.
> >
> > Yet, the u+1E9E doesn't quite look like two capital "S". So any
> > program implementing a conversion conforming to Unicode will
> > currently display/print in a wrong result: "MAßE" instead of the
> > correctly converted result "MASSE".
>
> While the default simple uppercasing of "maße" will yield "MAßE", the
> default full uppercasing will yield "MASSE".
>

Full uppercasing rules are normally locale-sensitive, and thus there should
exist a specific rule for German not yielding this result (see for example
the rules for Turkish dotless i vs dotted i).

I don't think these locale-sensitive rules are irrevocably stable as more
locales can be added at any time for some languages needing specific pairs.
The stabilized properties are for locale-neutral mappings only, in generic
contexts where the language is not known (including for standard
normalizations, or for the locale-neutral "root" collations and the
associated DUCET).

Even for the same language, these rules cannot be hardcoded in a stable
way, orthographies are evoluting over time, unless you use a locale
identifying the orthographic rule precisely (and the associated rulesets
are checked and corrected to reach a stable consensus: if there's an
evolution or variants, use another locale identifier) and that specific
orthography is entirely known (this is difficult for historic orthographies
or when there's no recognized language academy or national institution
fixing the rule to use for some country or region, but even these
institutions are working in their current working time and limiting their
scope to some applications, they will not reforme the history).

> I am not aware of a useful definition of 'conforming to Unicode' that
applies to either transformation.

I am not aware of a useful definition of 'conforming to Unicode' that
> applies to either transformation.


So if you look for an example look at how this is made for Turkish.
Basically this is just a matter of tailoring for specific locales.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20151209/9d8c8824/attachment.html>


More information about the Unicode mailing list