a character for an unknown character

Asmus Freytag asmusf at ix.netcom.com
Fri Dec 30 15:10:31 CST 2016


On 12/30/2016 4:37 AM, Richard Wordingham wrote:
> On Fri, 30 Dec 2016 01:23:55 +0100 (CET)
> Marcel Schneider <charupdate at orange.fr> wrote:
>
>> On Wed, 28 Dec 2016 19:05:17 -0800, Asmus Freytag wrote:
>>> On 12/28/2016 5:47 PM, Richard Wordingham wrote:
>> U+02BC being shifted from a letter to a punctuation must have been
>> anticipated at encoding, since the original recommendation was to use
>> it as apostrophe throughout. Unifying the letter apostrophe and the
>> punctuation apostrophe made IMO more sense—despite of the conflicting
>> properties
> What conflicts?  Both prototypically mark absences.
>
> The rationale seems to be that English uses both the punctuation
> apostrophe and the U+2019 RIGHT SINGLE QUOTATION MARK.  If users aren't
> being trained to use U+2212 MINUS SIGN, and habitually disable grammar
> and spell-checking, most won't make the right choice between U+02BC and
> U+2019.

Evidence seems to indicate that users in languages that were supposed to 
use U+02BC
tend to freely substitute U+0027 and to some degree U+2019.

To the point that U+02BC is being ruled out altogether in the case of 
more selective
policies for domain names, e.g. for the DNS root zone or the reference 
tables for
the second level.

Despite having formally been given the letter property, in practice, the 
fact that it
is visually indistinguishable does not allow it to be treated as a 
letter in all contexts.
>
>> Perhaps the letters for hexadecimal digits should have been encoded
>> separately?
> The idea has been rejected several times.
>
>>>> 5) The nightmare of spacing single and double dots.
>>> ? spacing vs. combining? Not sure what you mean.
>> I think Richard refers to U+2024 ONE DOT LEADER and U+2025 TWO DOT
>> LEADER, along with U+002E FULL STOP.
> That's not the half of it.  For starters, just look at the confusables
> for U+00B7 MIDDLE DOT:
>
> U+2022 BULLET
> U+2027 HYPHENATION POINT
> U+2219 BULLET OPERATOR
> U+22C5 DOT OPERATOR
> U+2E31 WORD SEPARATOR MIDDLE DOT
> U+30FB KATAKANA MIDDLE DOT
>
> There's an argument that the unification of U+00B7 and U+0387 ANO
> TELEIA is a unification too far.  A font for Greek may need to work out
> which it is to position it correctly.
>
> For double dots, there're the confusables for U+003A COLON:
> U+05C3 HEBREW PUNCTUATION SOF PASUQ
> U+2236 RATIO
>
> There's a whole raft of visargas, some of which match and some of
> which don't. What happened to the principle that diacritics are unified
> by form?  I suspect the answer is that encoding was established while
> principles were still developing.
>
>>>> As a result, I have no idea whether the singular of "fithp" (one
>>>> of Larry Niven's alien species) should be spelt with U+02BC or
>>>> U+2019, though in ASCII I can just write "fi'".
>> Normally on an English or French keyboard layout, all three are
>> accessed on live keys.
> That accessibility is news to me - normally I just have to fight a word
> processor if I want U+0027.  However, I still don't know whether to
> spell the word «fiʼ» or «fi’».  I've only seen it in print.
>
> Richard.
>
>



More information about the Unicode mailing list