Question about "Uppercase" in DerivedCoreProperties.txt

Laurentiu Iancu liancu at microsoft.com
Thu Nov 6 16:31:37 CST 2014


Hello,



The property Uppercase is a binary, informative property derived from General_Category (gc=Lu) and Other_Uppercase (OUpper=Y), as documented in Section 5.3 of UAX #44, at http://www.unicode.org/reports/tr44/#Uppercase.



All of the characters you enumerated are titlecase letters (gc=Lt) rather than uppercase letters (gc=Lu), and they are not specifically assigned Other_Uppercase (which would otherwise contradict their General_Category).  Following the derivation, they do not have the Uppercase binary property.



For a visualization of the set of characters assigned the binary property Uppercase in relation to the set of Uppercase_Letter characters (gc=Lu), you can use the UnicodeSet comparison tool at http://www.unicode.org/cldr/utility/unicodeset.jsp.  Enter “[:gc=Lu:]” in one input field and “[:Uppercase:]” in the other field, then click on Compare.



Regards,

L.



-----Original Message-----
From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Mike FABIAN
Sent: Thursday, November 6, 2014 12:32 AM
To: unicode at unicode.org
Subject: Question about "Uppercase" in DerivedCoreProperties.txt





I have a question about “Uppercase” in DerivedCoreProperties.txt:



U+1F80 ᾀ GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI

is listed as “Lowercase” in

http://www.unicode.org/Public/7.0.0/ucd/DerivedCoreProperties.txt :



       1F80..1F87    ; Lowercase # L&   [8] GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI..GREEK SMALL LETTER ALPHA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI



But



“U+1F88 ᾈ GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI”

is *not* listed as “Uppercase” in

http://www.unicode.org/Public/7.0.0/ucd/DerivedCoreProperties.txt .



Although U+1F80 seems to be Uppercase according to http://www.unicode.org/Public/7.0.0/ucd/UnicodeData.txt

because it has a tolower mapping to U+1F80:



    1F80;GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI;Ll;0;L;1F00 0345;;;;N;;;1F88;;1F88

    1F88;GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI;Lt;0;L;1F08 0345;;;;N;;;;1F80;



Is the information in DerivedCoreProperties.txt correct or could this be a bug in DerivedCoreProperties.txt?



The above is not only the case for U+1F88, but for several more characters.



All the characters listed below have a tolower mapping in http://www.unicode.org/Public/7.0.0/ucd/UnicodeData.txt

but are not listed in DerivedCoreProperties.txt as “Uppercase”:



    U+1F88 ᾈ has a tolower mapping to U+1F80 ᾀ

    U+1F89 ᾉ has a tolower mapping to U+1F81 ᾁ

    U+1F8A ᾊ has a tolower mapping to U+1F82 ᾂ

    U+1F8B ᾋ has a tolower mapping to U+1F83 ᾃ

    U+1F8C ᾌ has a tolower mapping to U+1F84 ᾄ

    U+1F8D ᾍ has a tolower mapping to U+1F85 ᾅ

    U+1F8E ᾎ has a tolower mapping to U+1F86 ᾆ

    U+1F8F ᾏ has a tolower mapping to U+1F87 ᾇ

    U+1F98 ᾘ has a tolower mapping to U+1F90 ᾐ

    U+1F99 ᾙ has a tolower mapping to U+1F91 ᾑ

    U+1F9A ᾚ has a tolower mapping to U+1F92 ᾒ

    U+1F9B ᾛ has a tolower mapping to U+1F93 ᾓ

    U+1F9C ᾜ has a tolower mapping to U+1F94 ᾔ

    U+1F9D ᾝ has a tolower mapping to U+1F95 ᾕ

    U+1F9E ᾞ has a tolower mapping to U+1F96 ᾖ

    U+1F9F ᾟ has a tolower mapping to U+1F97 ᾗ

    U+1FA8 ᾨ has a tolower mapping to U+1FA0 ᾠ

    U+1FA9 ᾩ has a tolower mapping to U+1FA1 ᾡ

    U+1FAA ᾪ has a tolower mapping to U+1FA2 ᾢ

    U+1FAB ᾫ has a tolower mapping to U+1FA3 ᾣ

    U+1FAC ᾬ has a tolower mapping to U+1FA4 ᾤ

    U+1FAD ᾭ has a tolower mapping to U+1FA5 ᾥ

    U+1FAE ᾮ has a tolower mapping to U+1FA6 ᾦ

    U+1FAF ᾯ has a tolower mapping to U+1FA7 ᾧ

    U+1FBC ᾼ has a tolower mapping to U+1FB3 ᾳ

    U+1FCC ῌ has a tolower mapping to U+1FC3 ῃ

    U+1FFC ῼ has a tolower mapping to U+1FF3 ῳ



Is that correct or a bug?





--

Mike FABIAN <mfabian at redhat.com<mailto:mfabian at redhat.com>>

☏ Office: +49-69-365051027, internal 8875027

睡眠不足はいい仕事の敵だ。

_______________________________________________

Unicode mailing list

Unicode at unicode.org<mailto:Unicode at unicode.org>

http://unicode.org/mailman/listinfo/unicode
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20141106/275cfea1/attachment.html>


More information about the Unicode mailing list