Ancient Greek apostrophe marking elision
James Tauber via Unicode
unicode at unicode.org
Sun Jan 27 13:09:31 CST 2019
On Sun, Jan 27, 2019 at 1:22 PM Richard Wordingham via Unicode <
unicode at unicode.org> wrote:
> Except the Uniocde-compliant processes aren't required to follow the
> scheme of TR27 Unicode Text Segmentation. However, it is only required
> to select the whole word because the U+2019 is followed by a letter.
> TR27 prescribes different behaviour for "dogs'" with U+2019 (interpret
> as two 'words') and U+02BC (interpret as one word). The GTK-based
> email client I'm using has that difference, but also fails with
> "don't" unless one uses U+02BC.
> However LibreOffice treats "don't" as a single word for U+0027, U+02BC
> and U+2019, but "dogs'" as a single word only for U+02BC. This
> complies with TR27. I'm not surprised, as LibreOffice does use or has
> used ICU.
This comes back to my original question that started this thread. Many
people creating Ancient Greek digital resources use U+02BC seemingly
because of incorrect word-breaking with *word-final* U+2019 (which is the
only time it occurs in Ancient Greek and always marking elision, never as
the end of a quotation).
I am trying to write guidelines as to why they should use U+2019. I'm
convinced it's technically the right code point to use but am wanting to
get my facts straight about how to address the word-breaking issue
(specifically for word-final U+2019 in Ancient Greek, to be clear). In my
original post, I asked if a language-specific tailoring of the text
segmentation algorithm was the solution but no one here has agreed so far.
Here's a concrete example from Smyth's Grammar:
Double-clicking on the first word should select the U+2019 as well.
Interestingly on macOS Mojave it does in Pages but not in Notes, the
Terminal or here in Gmail on Chrome.
To be clear: when I say "should" I mean that that is the expectation
classicists have and the failure to meet it is why some of them insist on
I'm happy if the answer is "use U+2019 and go get your text segmentation
implementations fixed" but am looking for confirmation of that.
 To be honest, I was impressed Pages got it right.
 In the same spirit as "if certain combining character combinations
don't work, the solution is not to add precomposed characters, it's to
improve the fonts" or "tonos and oxia are the same and if they look
different, it's the fault of your font".
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode