Another take on the English apostrophe in Unicode
verdy_p at wanadoo.fr
Thu Jun 11 13:39:41 CDT 2015
Also used in the Breton trigram c’h (considered as a single letter of the
Breton alphabet, but actually entered as two letters with a diacritic-like
apostrophe in the middle (which in this case is still not a letter of the
alphabet...): the trigram c’h is distinct from the digram ch.
Breton **also** uses a regular apostrophe for elision.
In fact what you note for the ejective in native american languages is
effectively a right-combining diacritic, and still not a letter by itself.
However, given its position and the fact it is "spacing", this is the
spacing form of the apostrophe diacritic that should be used, and that form
is then to choose between:
* U+00B4 (acute, most often ugly, located too high, and too much
* U+02B9 (prime, nearly good, but still too high),
* U+02BC (apostrophe),
* U+02C8 (vertical high tick, but confusable with the mark of stress in IPA
before a phonetic syllable), and
* U+02CA (acute/2nd tone, which for me is not distinct from 00B4, only used
with sinograms in Mandarin Chinese, with its metrics distinct from U+00B4
that match the Latin metrics).
In my opinion 02BC is the best choice for the diacritic apostrophe.
The other character for the **elision** apostrophe is a punctuation mark
U+2019 (just like the full stop punctuation is also used as an abbreviation
mark). There's no confusion with its alternate role as a right-side single
quote because U+2019 is used in languages that normally never use the
single quotes, but chevrons (or other punctuation signs in East-Asian
But in English where single quote are used for small quotations, there's
still a problem to represent this elision apostrophe when it does not occur
between two letters where it also marks a gluing of two morphemes (as in
"don't" or "Peter's"), but at the begining or end of a word. But elisions
at end of words is also invalid when this is the final word of a quoted
sentence. If you really want to cite a single English word terminated by an
elision apostrophe, the single quotes won't be usable and you'll use
chevrons like in this ‹demo’› and not single or double quotes which are
difficult to discriminate.
2015-06-11 19:47 GMT+02:00 Bill Poser <billposer2 at gmail.com>:
> To add a factor that I think hasn't been mentioned, there are languages in
> which apostrophe is used both as a letter by itself and as part of a
> complex letter. Most of the native languages of British Columbia write
> glottalized consonants as C+', e.g. <t'> for an ejective alveolar stop, and
> many use apostrophe by itself for the glottal stop. (Another common
> convention, which produces other difficulties, is to use the number <7> for
> glottal stop.)
> On Wed, Jun 10, 2015 at 2:10 PM, Ted Clancy <tclancy at mozilla.com> wrote:
>> On 4/Jun/2015 14:34 PM, Markus Scherer wrote:
>>> Looks all wrong to me.
>> Hi, Markus. I'm the guy who wrote the blog post. I'll respond to your
>> points below.
>>> You can't use simple regular expressions to find word boundaries. That's
>>> why we have UAX #29.
>> And UAX #29 doesn't work for words which begin or end with apostrophes,
>> whether represented by U+0027 or U+2019. It erroneously thinks there's a
>> word boundary between the apostrophe and the rest of the word.
>> But UAX #29 *would* work if the apostrophes were represented by U+02BC,
>> which is what I'm suggesting.
>> Confusion between apostrophe and quoting -- blame the scribe who came up
>>> with the ambiguous use, not the people who gave it a number.
>> I'm not trying to blame anyone. I'm trying to fix the problem.
>> I know this problem has a long history.
>> English is taught as that squiggle being punctuation, not a letter.
>> I think we need make a distinction between the colloquial usage of the
>> word "punctuation" and the Unicode general category "punctuation" which has
>> specific technical implications.
>> I somewhat wish that Unicode had a separate category for "Things that
>> look like punctuation but behave like letters", which might clear up this
>> taxonomic confusion. (I would throw U+02BE (MODIFIER LETTER RIGHT HALF
>> RING) and U+02BF (MODIFIER LETTER LEFT HALF RING), neither of which are
>> actually modifiers, into that category too.) But we don't. And the English
>> apostrophe behaves like a letter, regardless of what your primary school
>> teacher might have told you, so with the options available in Unicode, it
>> needs to be classed as a letter.
>> "don’t" is a contraction of two words, it is not one word.
>> This is utter nonsense. Should my spell-checker recognise "hasn't" as a
>> valid word? Or should it consider "hasn't" to be the word "hasn" followed
>> by the word "t", and then flag both of them as spelling errors?
>> Is "fo'c'sle" the three separate words "fo", "c", and "sle"?
>> The idea that words with apostrophes aren't valid words is a regrettable
>> myth that exists in English, which has repeatedly led to the apostrophe
>> being an afterthought in computing, leading to situations like this one.
>> If anything, Unicode might have made a mistake in encoding two of these
>>> that look identical. How are normal users supposed to find both U+2019
>>> U+02BC on their keyboards, and how are they supposed to deal with
>> Yeah, and there are fonts where I can't tell the difference between
>> capital I and lower-case l. But my spell-checker will underline a word
>> where I erroneously use an I instead of an l, and I imagine spell-checkers
>> of the future could underline a word where I erroneously use a closing
>> quote instead of an apostrophe, or vice versa.
>> There are other possible solutions too, but I don't want to get into a
>> discussion about UI design. I'll leave that to UI designers.
>> - Ted
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode