Question about "Uppercase" in DerivedCoreProperties.txt
Laurentiu Iancu
liancu at microsoft.com
Thu Nov 6 16:31:37 CST 2014
Hello,
The property Uppercase is a binary, informative property derived from General_Category (gc=Lu) and Other_Uppercase (OUpper=Y), as documented in Section 5.3 of UAX #44, at http://www.unicode.org/reports/tr44/#Uppercase.
All of the characters you enumerated are titlecase letters (gc=Lt) rather than uppercase letters (gc=Lu), and they are not specifically assigned Other_Uppercase (which would otherwise contradict their General_Category). Following the derivation, they do not have the Uppercase binary property.
For a visualization of the set of characters assigned the binary property Uppercase in relation to the set of Uppercase_Letter characters (gc=Lu), you can use the UnicodeSet comparison tool at http://www.unicode.org/cldr/utility/unicodeset.jsp. Enter “[:gc=Lu:]” in one input field and “[:Uppercase:]” in the other field, then click on Compare.
Regards,
L.
-----Original Message-----
From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Mike FABIAN
Sent: Thursday, November 6, 2014 12:32 AM
To: unicode at unicode.org
Subject: Question about "Uppercase" in DerivedCoreProperties.txt
I have a question about “Uppercase” in DerivedCoreProperties.txt:
U+1F80 ᾀ GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI
is listed as “Lowercase” in
http://www.unicode.org/Public/7.0.0/ucd/DerivedCoreProperties.txt :
1F80..1F87 ; Lowercase # L& [8] GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI..GREEK SMALL LETTER ALPHA WITH DASIA AND PERISPOMENI AND YPOGEGRAMMENI
But
“U+1F88 ᾈ GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI”
is *not* listed as “Uppercase” in
http://www.unicode.org/Public/7.0.0/ucd/DerivedCoreProperties.txt .
Although U+1F80 seems to be Uppercase according to http://www.unicode.org/Public/7.0.0/ucd/UnicodeData.txt
because it has a tolower mapping to U+1F80:
1F80;GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI;Ll;0;L;1F00 0345;;;;N;;;1F88;;1F88
1F88;GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI;Lt;0;L;1F08 0345;;;;N;;;;1F80;
Is the information in DerivedCoreProperties.txt correct or could this be a bug in DerivedCoreProperties.txt?
The above is not only the case for U+1F88, but for several more characters.
All the characters listed below have a tolower mapping in http://www.unicode.org/Public/7.0.0/ucd/UnicodeData.txt
but are not listed in DerivedCoreProperties.txt as “Uppercase”:
U+1F88 ᾈ has a tolower mapping to U+1F80 ᾀ
U+1F89 ᾉ has a tolower mapping to U+1F81 ᾁ
U+1F8A ᾊ has a tolower mapping to U+1F82 ᾂ
U+1F8B ᾋ has a tolower mapping to U+1F83 ᾃ
U+1F8C ᾌ has a tolower mapping to U+1F84 ᾄ
U+1F8D ᾍ has a tolower mapping to U+1F85 ᾅ
U+1F8E ᾎ has a tolower mapping to U+1F86 ᾆ
U+1F8F ᾏ has a tolower mapping to U+1F87 ᾇ
U+1F98 ᾘ has a tolower mapping to U+1F90 ᾐ
U+1F99 ᾙ has a tolower mapping to U+1F91 ᾑ
U+1F9A ᾚ has a tolower mapping to U+1F92 ᾒ
U+1F9B ᾛ has a tolower mapping to U+1F93 ᾓ
U+1F9C ᾜ has a tolower mapping to U+1F94 ᾔ
U+1F9D ᾝ has a tolower mapping to U+1F95 ᾕ
U+1F9E ᾞ has a tolower mapping to U+1F96 ᾖ
U+1F9F ᾟ has a tolower mapping to U+1F97 ᾗ
U+1FA8 ᾨ has a tolower mapping to U+1FA0 ᾠ
U+1FA9 ᾩ has a tolower mapping to U+1FA1 ᾡ
U+1FAA ᾪ has a tolower mapping to U+1FA2 ᾢ
U+1FAB ᾫ has a tolower mapping to U+1FA3 ᾣ
U+1FAC ᾬ has a tolower mapping to U+1FA4 ᾤ
U+1FAD ᾭ has a tolower mapping to U+1FA5 ᾥ
U+1FAE ᾮ has a tolower mapping to U+1FA6 ᾦ
U+1FAF ᾯ has a tolower mapping to U+1FA7 ᾧ
U+1FBC ᾼ has a tolower mapping to U+1FB3 ᾳ
U+1FCC ῌ has a tolower mapping to U+1FC3 ῃ
U+1FFC ῼ has a tolower mapping to U+1FF3 ῳ
Is that correct or a bug?
--
Mike FABIAN <mfabian at redhat.com<mailto:mfabian at redhat.com>>
☏ Office: +49-69-365051027, internal 8875027
睡眠不足はいい仕事の敵だ。
_______________________________________________
Unicode mailing list
Unicode at unicode.org<mailto:Unicode at unicode.org>
http://unicode.org/mailman/listinfo/unicode
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20141106/275cfea1/attachment.html>
More information about the Unicode
mailing list