Question about “Uppercase” in DerivedCoreProperties.txt

Mike FABIAN mfabian at redhat.com
Sat Nov 8 03:22:10 CST 2014


Philippe Verdy <verdy_p at wanadoo.fr> さんはかきました:

> note that tolower() and toupper() can only work one 1-character level, it
> is not recommended for use for changing case of plain text.
>
> For correct handling of locales, to upper and toupper should be replaced by
> strtolower and strtoupper (or their aliases) which will be able to process
> character clusters and contextual casing rules needed for a language or
> orthographic style

Yes, thank you for explaining this.

But these details of upper and lower casing cannot be expressed in the
“i18n” file of glibc:

https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/locales/i18n

For toupper and tolower, this file just has character -> character
mapping tables, for example the “tolower” table contains only

    (<U03A3>,<U03C3>)

(i.e. mapping Σ U+03A3 -> σ U+03C3, never to the final sigma ς
U+03C2).

More correct, detailed information about upper and lower case must come
from elsewhere, not from this “i18n” file in glibc.  Using only the
information from this “i18n” file, not even the Greek sigma can be
handled correctly.

Pravin and me want to update this “i18n” file to the latest
data from Unicode 7.0.0, doing it as correct as possible within
the limitations caused by this file and the ISO C standard.

-- 
Mike FABIAN <mfabian at redhat.com>
☏ Office: +49-69-365051027, internal 8875027
睡眠不足はいい仕事の敵だ。


More information about the Unicode mailing list