a character for an unknown character

Richard Wordingham richard.wordingham at ntlworld.com
Fri Dec 30 06:37:27 CST 2016


On Fri, 30 Dec 2016 01:23:55 +0100 (CET)
Marcel Schneider <charupdate at orange.fr> wrote:

> On Wed, 28 Dec 2016 19:05:17 -0800, Asmus Freytag wrote:
> > On 12/28/2016 5:47 PM, Richard Wordingham wrote:   

> U+02BC being shifted from a letter to a punctuation must have been
> anticipated at encoding, since the original recommendation was to use
> it as apostrophe throughout. Unifying the letter apostrophe and the
> punctuation apostrophe made IMO more sense—despite of the conflicting
> properties

What conflicts?  Both prototypically mark absences.

The rationale seems to be that English uses both the punctuation
apostrophe and the U+2019 RIGHT SINGLE QUOTATION MARK.  If users aren't
being trained to use U+2212 MINUS SIGN, and habitually disable grammar
and spell-checking, most won't make the right choice between U+02BC and
U+2019.

> Perhaps the letters for hexadecimal digits should have been encoded
> separately?

The idea has been rejected several times.

> > > 5) The nightmare of spacing single and double dots.   
> > ? spacing vs. combining? Not sure what you mean.  

> I think Richard refers to U+2024 ONE DOT LEADER and U+2025 TWO DOT
> LEADER, along with U+002E FULL STOP.

That's not the half of it.  For starters, just look at the confusables
for U+00B7 MIDDLE DOT:

U+2022 BULLET
U+2027 HYPHENATION POINT
U+2219 BULLET OPERATOR
U+22C5 DOT OPERATOR
U+2E31 WORD SEPARATOR MIDDLE DOT
U+30FB KATAKANA MIDDLE DOT

There's an argument that the unification of U+00B7 and U+0387 ANO
TELEIA is a unification too far.  A font for Greek may need to work out
which it is to position it correctly.

For double dots, there're the confusables for U+003A COLON:
U+05C3 HEBREW PUNCTUATION SOF PASUQ
U+2236 RATIO

There's a whole raft of visargas, some of which match and some of
which don't. What happened to the principle that diacritics are unified
by form?  I suspect the answer is that encoding was established while
principles were still developing.

> > > As a result, I have no idea whether the singular of "fithp" (one
> > > of Larry Niven's alien species) should be spelt with U+02BC or
> > > U+2019, though in ASCII I can just write "fi'".   
> 
> Normally on an English or French keyboard layout, all three are
> accessed on live keys.

That accessibility is news to me - normally I just have to fight a word
processor if I want U+0027.  However, I still don't know whether to
spell the word «fiʼ» or «fi’».  I've only seen it in print.

Richard.



More information about the Unicode mailing list