Ancient Greek apostrophe marking elision

Michael Everson via Unicode unicode at
Sun Jan 27 10:08:20 CST 2019

On 27 Jan 2019, at 05:21, Richard Wordingham <richard.wordingham at> wrote:

>>> I’ll be publishing a translation of Alice into Ancient Greek in due
>>>> course. I will absolutely only use U+2019 for the apostrophe. It
>>>> would be wrong for lots of reasons to use U+02BC for this.  
>>> Please list them.  
>> The Greek use is of an apostrophe. Often a mark elision (as here),
>> that’s what 2019 is for.
>> 02BC is a letter. Usually a glottal stop. 
> So it would seem that the 'lots of reasons' is just that it goes against the *recommendation* of TUS.

I have no idea what TUS says about this. I did not look it up. I know a lot about characters, though. 

> Incidentally, I believe the principal use of U+2019 RIGHT SINGLE QUOTATION MARK is as a quotation mark.

You can believe what you like, but that isn’t likely true. In books which prefer “this kind” of quotation marks for primary quotations and ’this kind’ for nested quotations, 2019 is primarily used for the apostrophe in words like I’m, can’t, isn’t, don’t etc. In books which prefer ’this kind’ for primary quotations 2019 the statistics will be different. But 2019 is still the correct character for both.

> As you have noted in the text left in below, U+02BC started out as the apostrophe.

Lead-type typesetters used that sort, yes. And that sort was used for both apostrophe and single quotation marks. 

> The closing single inverted comma has a different origin to the apostrophe.

No, it doesn’t, but you are welcome to try to prove your assertion. 

> My argument for U+02BC is that this apostrophe is an integral part of the word.

It is a letter. In “can’t” the apostrophe isn’t a letter. It’s a mark of elision.  I can double-click on the three words in this paragraph which have the apostrophe in them, and they are all whole-word selected. 

> The main constituent of a prototypical word are letters and their attendant marks. Now, the word-breaking algorithm in TR27 allows for various generally overloaded elements to join elements of a word. However, this apostrophe does not mark the boundary of constituents. Accordingly it makes sense to treat it as a letter.

The behaviour of 2019 it not broken. I use it every day. I’ve typeset many many books in English and Cornish and Irish, all of which use single quotation marks and double quotation marks and lots and lots of apostrophes, and I have no trouble with them. 2019 has for decades been treated correctly in software that I use. 

> Treating the Greek apostrophe as a letter (U+02BC) gives better word-breaking.

Why do you claim this? I did not read the beginning of this thread and I am not going to try to find it. What is the problem you claim to have? In what software? On what platform?

> I don't see any downside in treating it like a Polynesian glottal stop.

I do. And to try to replace the apostrophe in English can’t and don’t and all is doomed to fail. Doomed. 

Moreover there are good practical reasons to change the glyph for the Polynesian letter.

When I typeset Greek, I will use 2019 for the apostrophe. 

> Is someone going to tell me there is an advantage in treating "men's” as one word but "dogs'" as two?  As I've said, the argument for encoding English apostrophes as U+2019 is that even with adequate keyboards, users cannot be relied upon to distinguish U+02BC and U+2019 - especially with no feedback. A writing system should choose one and stick with it.  User unreliability forces a compromise.

Polynesian users need to 02BC to be visually distinguished from 2019. European users don’t need the apostrophe to be visually distinguished from 2019. The edge case of “dogs’” doesn’t convince me. In all my years of typesetting I have never once noticed this, much less considered it a problem that needed fixing.

> Now, if text processors were to enable a difference, then the arguments would change.  I for one find it helpful that Microsoft Word is willing to display visible symbols for spaces and tab characters so that I know what white space is composed of.

Most word-processing typesetting programs will do this. Quark and InDesign do. Word and LibreOffice and Apple Pages do. 

>> I didn’t follow the beginning of this. Evidently it has something to do with word selection of d’ + a space + what follows. If that’s so, then there’s no argument at all for 02BC. It’s a question of the space, and that’s got nothing to do with the identity of the apostrophe.
> The word selection issue is that except before a letter, the standard word-breaking algorithm says that there is a word boundary between the delta and apostrophe.

Well, that’s the expected behaviour for a character which is polyvalent. If you have problems double-clicking “d’ Artagnan” you should probably just write “d’Artagnan”. 

>>> Will your coding decision be machine readable for the readership?  
>> I don’t know what you mean by “readable”.
> Will the difference between U+02BC and U+2019 be discernible by the readers?

They should be, in Polynesian languages. Otherwise the text isn't easily legible. 

> If one could copy a phrase to a general application and select a word by double-clicking, then the difference would be visible.

If you know what the behaviour is then you can take it into account when you are copying a word. You can’t fix this by character encoding. Certainly not by screwing with 02BC.

> If the result of the publishing is simply a printed book, then your choice of U+2019 or U+02BC will depend only on font differences.

That non-argument can be applied to everything. 

> Not that it makes much difference to the issue,  but isn't the correct encoding for the ʻokina U+02BB MODIFIER LETTER TURNED COMMA? 

Yes, but both 02BB and 02BC are used in linguistic transcriptions and in Polynesian languages, and the graphic identity with 2018 and 2019 is problematic and unnecessary.

Using 02BC for the apostrophe is a mistake, in my view.

Michael Everson

More information about the Unicode mailing list