Another take on the English Apostrophe in Unicode
charupdate at orange.fr
Wed Jun 17 11:18:32 CDT 2015
On Tue, Jun 16, Mark Davis ☕️ wrote:
> And, Marcel, while you are at it, this is getting tiresome.
> Please find some other place to vent about events you know very little about; the internet is full of them.
I understand (a little) that I'm tiresome. Please consider nevertheless that the Unicode Public Maliling List is AFAIK the only spot where people can communicate with Unicode decision makers. No other mailing list nor any forum on the internet can do this. Even Microsoft's Community forum can do nothing at Microsoft, forum volunteers told me. I posted there in French and in English. In French my most useful post seems to be at http://answers.microsoft.com/fr-fr/office/forum/office_2010-word/recherche-invers%C3%A9e-dans-les-listes/845a02fa-aa2d-4d81-a03e-12ecb7f2f46b
Since your message could not reach me yesterday, I prepared two replies I wanted to send today. It was exactly one to Doug and one to you.
If you agree, I'll paste them both hereafter.
On Tue, Jun 16, 2015, Doug Ewell wrote:
> You know what? If you want to use U+02BC as an English apostrophe, go ahead and use it. Nobody's stopping you really. Not Unicode, not Microsoft, not ISO.
You know I did, and if it were just for my ownʼs sake, Iʼd probably never started mailing in this thread. A big part of text to be processed on quotes originates from other people. So when I use U+02BC, I did a good work (if I were quoted :)).
A essential condition is that all text handling software is updated to handle correctly the letter apostrophe. Without an official recommendation, this is not likely to be done. And this recommendation cannot be usefully issued unless Microsoft agrees. We remember that without Microsoft, the Unicode Consortium probably wouldnʼt have been founded, and character encoding wouldnʼt thrive as it does today.
On Mon, Jun 15, 2015, 20:14, Doug Ewell wrote:
> Perhaps a UTC member can confirm whether this is fact or speculation. Markus Kuhn's comment from 1999 about "couldn't Unicode follow Microsoft...?" doesn't prove that Unicode was in fact strong-armed by Microsoft.
I know that Markus Kuhnʼs concern was very valuable and he did a great job by showing how to eradicate the clumsy quotes simulation that was current by the time, due to the lack of characters. You remember, they used accents as quotes, and at that stage, the mixup was between apostrophe and acute!
The curly glyph for 0x27 in old ASCII fonts and its reversed counterpart mapped to 0x60 Mr Kuhn shows on this page and how to replace them properly, remind the U+201B—U+2019 quotes pair where the deprecated REVERSED SINGLE COMMA QUOTATION MARK was discussed on this List, the conclusion being:
On Thu, Jun 15, 2006, Andreas Prilop wrote:
> Actually, I have seen such quotation marks in English-language books printed in Britain and the USA. But, as I wrote, they are certainly not preferred. *If* you want such quotation marks, then please use U+201B for them!
At that time, the matter was correct rendering. Today, itʼs correct processing.
Yes, fortunately U+02BC is *not deprecated* for English apostrophe, and looking closer, IMO there is *no recommendation* for U+2019 neither, just a stated preference. As I wrote sooner in this thread, Unicode logically and seemingly changed the preference against its will.
Logically, because the first recommendation (like the whole Standard) was consciously designed, Mr Davis recalled us the day before yesterday.
Seemingly, because the U+0027 comment line in the Code Chart has been changed from
> preferred character for apostrophe is 2019
> 2019 is preferred for apostrophe
between the 3.0.0 and 4.0.0 versions (while the line “preferred characters in English for paired quotation marks are 2018 & 2019” remained unchanged; see the complete comparison at http://charupdate.info#ambiguation).
On Tue, Jun 16, 2015, Doug Ewell wrote:
> I do wish we could put an end to all the accusations of malfeasance.
Experience proves that often a lot of mails, e-mails, blog posts, fora posts, tweets and so on are needed to get things move. The best way of getting nothing to be done is to get everybody convinced itʼs all OK. Thatʼs what I sometimes feel reading this thread, or the one about ISO/IEC JTC1/SC2/WG2 that is on-going in the meantime!
And the only way to get something change has always been to show itʼs wrong.
>From there on, the next step would be to find out who is responsible.
About the apostrophe, weʼre all a bit responsible.
Why to hide that British English usage does not much to disambiguate things, by preferring single quotes as current quotation marks, leading some authors to end up preferring chevrons even in English—see Chris Harvey (pleading for U+2019 as apostrophe) at http://www.languagegeek.com/typography/apostrophes.html#Anchor-Potentia-61409
But Microsoft is responsible, too.
And Microsoft and we have the power to bring it to a solution: everybody on his PC, and Microsoft together with Unicode and ISO on a global level.
So letʼs tackle.
On Mon, Jun 15, 2015 at 10:19 AM, Mark Davis ☕️ wrote:
> In practice, whenever characters are essentially identical—and by that I mean that the overlap between the acceptable glyphs for each character is very high—people will inevitably mix up the characters on entry. So any processing that depends on that distinction is forced to correct the data anyway. And separating them causes even simple things like searching for a character on a page to get screwed up without having equivalence classes.
Now I use U+02BC, I experience that in most applications, this is not yet a part of the equivalence class of apostrophe-single-quote, where only U+0027, U+2019 and U+2018 seem to be in. However, when at the occasion of the next software updates, U+02BC is added to this class, that wouldnʼt always be enough for the software to work fine. Options should be added to disable these equivalences, like today case-sensitivity can already be enabled in most search dialogs.
But without an official recommendation, all this will scarcely be done.
Could Unicode please add again a recommendation for U+02BC at U+0027? You could for example recommend to prefer U+02BC for processing, and U+2019 for printing while waiting that fonts are updated. Or you could recommend U+02BC, and admit that U+2019 is used in legacy compatibility mode.
The main reason for the status quo to be protected (as it seems to be), could however be the fear of image damages. Imagine people learning that there is a flaw in the apostrophe. It will be hard to explain why it was ambiguated and why we come up today with disambiguation; why there are new radio buttons for LETTER APOSTROPHE and PUNCTUATION APOSTROPHE (to give it a cool name; the former converts U+0027 always to U+02BC, the latter works as today...); how the nested quotes algorithm works (supposing that today, it isnʼt still implemented); and why to hit the quotation mark two times when the “other” quotation mark is wished. Quite a lot of job.
Thereʼs a nice workaround to input high quality text files. Turn off smart quotes, use U+0027 for apostrophe only, and type a left square bracket to open a quotation, a curly for a nested or alternate one. The brackets pairing algorithm will accurately close. Square brackets for output may be entered as < > or as two parentheses. Once finished, save that file at a secure place. Then open a copy and replace the apostrophes with U+02BC (or U+2019, depending on whether U+02BC is in the target font), then the four bracketing characters with whatever quotes you need, and finish with the definite square brackets. That should work on every text or wysiwyg editor.
However, I believe we should start at another end, stopping to eat that insane stuff that is processed from insulted, tortured, poisoned, and slowly killed animals and brings us but acidosis, osteoporosis, and much more... but nothing good, nothing that were worth the confusion of ethical values on the pattern intiated by the Nazi government. I know itʼs off the topic, but it should bring us nearer to a helpful solution.Therefore I permit me to suggest to (re)watch ‘Earthlings’ and visit Gary Yourofskyʼs website and Facebook profile.
Once weʼve resolved the problems pointed out there—which at a personal level is very easy to perform—, I believe we shall stop redoing the errors of the past:
(also on YouTube).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode