Re: Question about “Uppercase” in DerivedCoreProperties.txt

Philippe Verdy verdy_p at wanadoo.fr
Thu Nov 6 13:46:11 CST 2014


this is a "feature" of the Greek alphabet that the lowercase iota subscript
can be capitalized in two different ways : either as a subscript below the
uppercase main letter, or as a standard iota capitalized. The subscript
form is a combining character, but not the non-subscript form. There shouls
exist a special contextual rule for language specific casings, there's one
already for the final sigma; but not the iota. It is not evident to handle
and in fact the choice of case mapping is not specifically a lingusitic
rule but a rendering style rule : for carved inscriptions, which are
generally using only capitals, the combining forms are generally avoided
and a reduced alphabet is used. For handwritten and cursive styles, the
extended alphabet is used and this enables contextual forms including the
small iota subscript and final small sigma an many combining signs (this
also allows other placement rules for accents. For printing purpose or
dispˆlay there's no rule, the document author enables or disables the
extended alphabet (disabled geerally for rendering with small resolutions).
The simple case mappngs however should preserve the distinctions present on
the extended alphabet, but simple uppercasing text should not convert
lowercase to all uppercase with an appended uppercase iota, even if this
maps a lowercase letter to a titlecase one (it would be lossy, simplet
casing rules should be lossless).
case mappings in the ùain UCD however ignore the contextual rules and
language-sˆpecific and style specific rules. But even if they are wrong
this cannot be changed. The simple mappings in the main UCD file should not
be assumed to be lossless. Actual case mappers do not use just these basic
rules which are just the most frequent mappings assumed (anyway any kinds
of case concersions introduces a loss, the degree of los is variable when
mappings are not concerned by just a single pair of simple letters, see
also the old difficulties about the German ess-tsett or sharp sign, and
about many ligatures that became plain letters in some contexts, including
the ampersand '&" sign which originates from the "et" ligature, or the
German umlaut which inherits some old behavior of the superscripted small
latin letter "e" behaving like the Greek iota script in Fraktur font styles)

2014-11-06 16:55 GMT+01:00 Mike FABIAN <maiku.fabian at gmail.com>:

>
> I have a question about “Uppercase” in DerivedCoreProperties.txt:
>
> U+1F80 ᾀ GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI
> is listed as “Lowercase” in
> http://www.unicode.org/Public/7.0.0/ucd/DerivedCoreProperties.txt :
>
>        1F80..1F87    ; Lowercase # L&   [8] GREEK SMALL LETTER ALPHA WITH
> PSILI AND YPOGEGRAMMENI..GREEK SMALL LETTER ALPHA WITH DASIA AND
> PERISPOMENI AND YPOGEGRAMMENI
>
> But
>
> “U+1F88 ᾈ GREEK CAPITAL LETTER ALPHA WITH PSILI AND PROSGEGRAMMENI”
> is *not* listed as “Uppercase” in
> http://www.unicode.org/Public/7.0.0/ucd/DerivedCoreProperties.txt .
>
> Although U+1F80 seems to be Uppercase according to
> http://www.unicode.org/Public/7.0.0/ucd/UnicodeData.txt
> because it has a tolower mapping to U+1F80:
>
>     1F80;GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI;Ll;0;L;1F00
> 0345;;;;N;;;1F88;;1F88
>     1F88;GREEK CAPITAL LETTER ALPHA WITH PSILI AND
> PROSGEGRAMMENI;Lt;0;L;1F08 0345;;;;N;;;;1F80;
>
> Is the information in DerivedCoreProperties.txt correct or
> could this be a bug in DerivedCoreProperties.txt?
>
> The above is not only the case for U+1F88, but for several more characters.
>
> All the characters listed below have a tolower mapping in
> http://www.unicode.org/Public/7.0.0/ucd/UnicodeData.txt
> but are not listed in DerivedCoreProperties.txt as “Uppercase”:
>
>     U+1F88 ᾈ has a tolower mapping to U+1F80 ᾀ
>     U+1F89 ᾉ has a tolower mapping to U+1F81 ᾁ
>     U+1F8A ᾊ has a tolower mapping to U+1F82 ᾂ
>     U+1F8B ᾋ has a tolower mapping to U+1F83 ᾃ
>     U+1F8C ᾌ has a tolower mapping to U+1F84 ᾄ
>     U+1F8D ᾍ has a tolower mapping to U+1F85 ᾅ
>     U+1F8E ᾎ has a tolower mapping to U+1F86 ᾆ
>     U+1F8F ᾏ has a tolower mapping to U+1F87 ᾇ
>     U+1F98 ᾘ has a tolower mapping to U+1F90 ᾐ
>     U+1F99 ᾙ has a tolower mapping to U+1F91 ᾑ
>     U+1F9A ᾚ has a tolower mapping to U+1F92 ᾒ
>     U+1F9B ᾛ has a tolower mapping to U+1F93 ᾓ
>     U+1F9C ᾜ has a tolower mapping to U+1F94 ᾔ
>     U+1F9D ᾝ has a tolower mapping to U+1F95 ᾕ
>     U+1F9E ᾞ has a tolower mapping to U+1F96 ᾖ
>     U+1F9F ᾟ has a tolower mapping to U+1F97 ᾗ
>     U+1FA8 ᾨ has a tolower mapping to U+1FA0 ᾠ
>     U+1FA9 ᾩ has a tolower mapping to U+1FA1 ᾡ
>     U+1FAA ᾪ has a tolower mapping to U+1FA2 ᾢ
>     U+1FAB ᾫ has a tolower mapping to U+1FA3 ᾣ
>     U+1FAC ᾬ has a tolower mapping to U+1FA4 ᾤ
>     U+1FAD ᾭ has a tolower mapping to U+1FA5 ᾥ
>     U+1FAE ᾮ has a tolower mapping to U+1FA6 ᾦ
>     U+1FAF ᾯ has a tolower mapping to U+1FA7 ᾧ
>     U+1FBC ᾼ has a tolower mapping to U+1FB3 ᾳ
>     U+1FCC ῌ has a tolower mapping to U+1FC3 ῃ
>     U+1FFC ῼ has a tolower mapping to U+1FF3 ῳ
>
> Is that correct or a bug?
>
> --
> �� Mike FABIAN   <mike.fabian at gmx.de>
> 睡眠不足はいい仕事の敵だ。
> _______________________________________________
> Unicode mailing list
> Unicode at unicode.org
> http://unicode.org/mailman/listinfo/unicode
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20141106/de2022ee/attachment.html>


More information about the Unicode mailing list