Another take on the English apostrophe in Unicode

Ted Clancy tclancy at mozilla.com
Wed Jun 10 17:51:45 CDT 2015


On 4/Jun/2015 19:01, Leo Broukhis wrote:
>
> Along the same lines, we might need a MODIFIER LETTER HYPHEN, because, for
> example, the work ack-ack isn't decomposable into words, or even
> morphemes,
> "ack" and "ack".
>
I do think that U+2010 (HYPHEN) is miscategorised. I think it should have
General Category = Pc, not Pd. (That is, hyphens are connectors, not
dashes.) That would make it a "word" character.

Or, at the very least, U+2010 should have Word Break = MidNumLet (meaning
it can occur in the middle of numbers or letters). UAX #29 says that U+2010
deliberately does *not* have Word Break = MidNumLet, though an
implementation may treat it as if it did. (UAX #29 doesn't give any reasons
for this decision. I can understand why U+002D (HYPHEN-MINUS) doesn't have
Word Break = MidNumLet, due to its history of being used as a dash or minus
sign, but U+2010 should never be used as a dash or minus sign, so I don't
see the problem.)

But luckily, the miscategorisation of U+2010 hasn't led to any pressing
practical problems, unlike the misuse of U+2019 for the apostrophe.

- Ted
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150610/6f2652a4/attachment.html>


More information about the Unicode mailing list