Re: Question about “Uppercase” in DerivedCoreProperties.txt

Christopher Vance cjsvance at gmail.com
Sat Nov 8 18:45:38 CST 2014


So glibc is broken. This doesn't make it a Unicode problem.

On Sat, Nov 8, 2014 at 8:22 PM, Mike FABIAN <mfabian at redhat.com> wrote:

> Philippe Verdy <verdy_p at wanadoo.fr> さんはかきました:
>
> > note that tolower() and toupper() can only work one 1-character level, it
> > is not recommended for use for changing case of plain text.
> >
> > For correct handling of locales, to upper and toupper should be replaced
> by
> > strtolower and strtoupper (or their aliases) which will be able to
> process
> > character clusters and contextual casing rules needed for a language or
> > orthographic style
>
> Yes, thank you for explaining this.
>
> But these details of upper and lower casing cannot be expressed in the
> “i18n” file of glibc:
>
> https://sourceware.org/git/?p=glibc.git;a=blob;f=localedata/locales/i18n
>
> For toupper and tolower, this file just has character -> character
> mapping tables, for example the “tolower” table contains only
>
>     (<U03A3>,<U03C3>)
>
> (i.e. mapping Σ U+03A3 -> σ U+03C3, never to the final sigma ς
> U+03C2).
>
> More correct, detailed information about upper and lower case must come
> from elsewhere, not from this “i18n” file in glibc.  Using only the
> information from this “i18n” file, not even the Greek sigma can be
> handled correctly.
>
> Pravin and me want to update this “i18n” file to the latest
> data from Unicode 7.0.0, doing it as correct as possible within
> the limitations caused by this file and the ISO C standard.
>
> --
> Mike FABIAN <mfabian at redhat.com>
> ☏ Office: +49-69-365051027, internal 8875027
> 睡眠不足はいい仕事の敵だ。
> _______________________________________________
> Unicode mailing list
> Unicode at unicode.org
> http://unicode.org/mailman/listinfo/unicode
>



-- 
Christopher Vance
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20141109/584895dd/attachment.html>


More information about the Unicode mailing list