Default case algorithms

Philippe Verdy verdy_p at wanadoo.fr
Wed Jun 25 07:37:39 CDT 2014


2014-06-25 10:52 GMT+02:00 Daniel Bünzli <daniel.buenzli at erratique.ch>:

> Le mercredi, 25 juin 2014 à 09:10, Richard Wordingham a écrit :
> > Yes - with the caveat that the uppercase mapping of U+0345 is too
> > complicated to defined formally.
> >
> > On the other hand, the Lowercase_Mapping property seems to be inadequate
> > for the default lowercase mapping - Greek final sigma is the
> > complication.
>
> So what you seem to imply is that Unicode’s default full casing are
> defined by applying
>
> 1) The unconditional mappings of SpecialCasing.txt
> 2) The conditional mappings of SpecialCasing.txt (there’s only one, the
> final
> sigma case).
>

There's also the Turkic i or j (problems related to letters that are
usually soft-dotted in the Latin script except in Turkic languages, whose
case mapping is context-dependant with the right side to see if we need to
add a combining dot above).
We could insist to have Turkish texts using an explicit combining dot above
after the dotless i (or j), biut most Turkish texts just use the plain
ASCII letter, by reinterpreting its soft-dot as a hard dot, that needs to
be added when converting to uppercase, and removed when conertng to
lowercase. Note also that the dotless i or dotless j are not part of any
case pair.
For Turkish readers, a dotless i followed by an explicit combining dot
above (hard dot) is not recommanded, and they use the standard ASCII letter
directly, as if it was a precombined but decomposable letter. In Turkish
texts, a dotless i without diacritic pairs with a capital ASCII letter I
directly (this mapping to uppercase is *not* contextual,but the reverse
conversion to lowercase *is* contextual).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140625/1a28f17d/attachment.html>


More information about the Unicode mailing list