Another take on the English apostrophe in Unicode

Mark Davis ☕️ mark at
Mon Jun 15 03:10:10 CDT 2015

On Mon, Jun 15, 2015 at 9:17 AM, Marcel Schneider <charupdate at>

> When we take the topic down again from linguistics to the core mission of
> Unicode, that is character encoding and text processing standardisation,
> ellipsis and Swedish abbreviation colon differ from the single closing
> quotation mark in this, that they are not to be processed.
> Linguistics, however, delivered the foundation on which Unicode issued its
> first recommendation on what character to use for apostrophe. The result
> was neither a matter of opinion, nor of probabilities.
> Actually, the choice is between perpetuating confusion in word processing,
> and get people confused for a little time when announcing that U+2019 for
> apostrophe was a mistake.
​Quite nice of you to inform me of the core mission of Unicode—I must have
somehow missed that.

More seriously, it is not all so black and white. As we developed​ Unicode,
we considered whether to separate characters by function, eg, an END OF
PERIOD, etc. Or DIARASIS vs UMLAUT. We quickly concluded that the costs
far, far outweighed the benefits.

In practice, whenever characters are essentially identical—and by that I
mean that the overlap between the acceptable glyphs for each character is
very high—people will inevitably mix up the characters on entry. So any
processing that depends on that distinction is forced to correct the data
anyway. And separating them causes even simple things like searching for a
character on a page to get screwed up without having equivalence classes.

So we only separated essentially identical characters in limited cases:
such as letters from different scripts.

Mark <>

*— Il meglio è l’inimico del bene —*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Unicode mailing list