Another take on the English apostrophe in Unicode

Philippe Verdy verdy_p at wanadoo.fr
Sat Jun 13 02:02:40 CDT 2015


I disagree: U+02BC already qualifies as a letter (even if it is not
specific to the Latin script and is not dual-cased). It is perfectly
integrable in language-specific alphabets and we don't need another
character to encode it once again as a letter.

So the only question is about choosing between:
- on one side, U+02BC (the existing apostrophe letter), and other possible
candidate letters for alternate forms (including U+02C8 for the vertical
form, and the common fallback letter U+00B4 present in many legacy fonts
for systems built before the UCS was standardized and using legacy 8-bit
charsets such as ISO 8859-1).
- and on the other side, U+2019 where it is encoded as a quotation
punctuation mark (like also the legacy ASCII single quote)

Note that U+00B4 (from ISO 8859-1) has also been used in association with
U+0074 (from ASCII) to replace the more ambiguous ASCII quote U+0027 by
assigning an orientation: the exact shape of these two is variable, between
a thin rectangle, or a wedge, or a curly comma (shaped like 6 and 9
digits), as well as the exact angle when it is a wedge or thin rectangle
(these characters however have been used since long in overstriking mode to
add accents over Latin capital letters, so the curly comma shapes are very
uncommon and they are more horizontal than vertical and U+00B4 will be a
very poor cantidate for the apostrophe that should have a narrow advance
width.

So there remains in practice U+02BC and U+02C8 for this apostrophe letter
(which one you'll use is a matter of preference but U+02C8  will not be
used if there are two distinct apostrophes in the language (e.g. in
Polynesian languages where the distinction was made even more clearer by
using right or left rings U+02BE/U+02BF, or glottal letters U+02C0/U+02C1
if that letter has a very distinctive phonetic realisation as a plain
consonnant with two variants like in Arabic or even U+02B0 when this is
just a breath without stop: the full range range U+02B0-U+02C1 offers much
enough variations for this letter if you need slight phonetic distinctions).

2015-06-13 8:28 GMT+02:00 Peter Constable <petercon at microsoft.com>:

> Nice article, as I recall. (Been a long time.)
>
>
> Peter
>
> -----Original Message-----
> From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of
> Kalvesmaki, Joel
> Sent: Friday, June 5, 2015 7:27 AM
> To: Unicode Mailing List
> Subject: Re: Another take on the English apostrophe in Unicode
>
> I don't have a particular position staked out. But to this discussion
> should be added the very interesting work done by Zwicky and Pullum arguing
> that the apostrophe is the 27th letter of the Latin alphabet. Neither
> U+2019 nor U+02BC would satisfy that position. See:
>
> Zwicky and Pullum 1983 Zwicky, Arnold M., and Geoffrey K. Pullum.
> "Cliticization vs. Inflection: English N'T."Language59, no. 3 (1983):
> 502-513.
>
> It's nicely summarized and discussed here:
> http://chronicle.com/blogs/linguafranca/2013/03/22/being-an-apostrophe/
>
> jk
> --
> Joel Kalvesmaki
> Editor in Byzantine Studies
> Dumbarton Oaks
> 202 339 6435
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150613/b32c6fd7/attachment.html>


More information about the Unicode mailing list