Ancient Greek apostrophe marking elision

Mark Davis ☕️ via Unicode unicode at unicode.org
Sat Jan 26 05:02:58 CST 2019


> breaking selection for "d'Artagnan" or "can't" into two is overly fussy.

True, and that is not what U+2019 does; it does not break medially.

Mark


On Fri, Jan 25, 2019 at 11:07 PM Asmus Freytag via Unicode <
unicode at unicode.org> wrote:

> On 1/25/2019 9:39 AM, James Tauber via Unicode wrote:
>
> Thank you, although the word break does still affect things like
> double-clicking to select.
>
> And people do seem to want to use U+02BC for this reason (and I'm trying
> to articulate why that isn't what U+02BC is meant for).
>
> For normal edition operations, breaking selection for "d'Artagnan" or
> "can't" into two is overly fussy.
>
> No wonder people get frustrated.
>
> A./
>
> James
>
> On Fri, Jan 25, 2019 at 12:34 PM Mark Davis ☕️ <mark at macchiato.com> wrote:
>
>> U+2019 is normally the character used, except where the ’ is considered a
>> letter. When it is between letters it doesn't cause a word break, but
>> because it is also a right single quote, at the end of words there is a
>> break. Thus in a phrase like «tryin’ to go» there is a word break after the
>> n, because one can't tell.
>>
>> So something like "δ’ αρχαια" (picking a phrase at random) would have a
>> word break after the delta.
>>
>> Word break:
>> δ’ αρχαια
>>
>> However, there is no *line break* between them (which is the more
>> important operation in normal usage). Probably not worth tailoring the word
>> break.
>>
>> Line break:
>> δ’ αρχαια
>>
>> Mark
>>
>>
>> On Fri, Jan 25, 2019 at 1:10 PM James Tauber via Unicode <
>> unicode at unicode.org> wrote:
>>
>>> There seems some debate amongst digital classicists in whether to use
>>> U+2019 or U+02BC to represent the apostrophe in Ancient Greek when marking
>>> elision. (e.g. δ’ for δέ preceding a word starting with a vowel).
>>>
>>> It seems to me that U+2019 is the technically correct choice per the
>>> Unicode Standard but it is not without at least one problem: default word
>>> breaking rules.
>>>
>>> I'm trying to provide guidelines for digital classicists in this regard.
>>>
>>> Is it correct to say the following:
>>>
>>> 1) U+2019 is the correct character to use for the apostrophe in Ancient
>>> Greek when marking elision.
>>> 2) U+02BC is a misuse of a modifier for this purpose
>>> 3) However, use of U+2019 (unlike U+02BC) means the default Word
>>> Boundary Rules in UAX#29 will (incorrectly) exclude the apostrophe from the
>>> word token
>>> 4) And use of U+02BC (unlike U+2019) means Glyph Cluster Boundary Rules
>>> in UAX#29 will (incorrectly) include the apostrophe as part of a glyph
>>> cluster with the previous letter
>>> 5) The correct solution is to tailor the Word Boundary Rules in the case
>>> of Ancient Greek to treat U+2019 as not breaking a word (which shouldn't
>>> have the same ambiguity problems with the single quotation mark as in
>>> English as it should not be used as a quotation mark in Ancient Greek)
>>>
>>> Many thanks in advance.
>>>
>>> James
>>>
>>
>
> --
> *James Tauber*
> Greek Linguistics: https://jktauber.com/
> Music Theory: https://modelling-music.com/
> Digital Tolkien: https://digitaltolkien.com/
>
> Twitter: @jtauber
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20190126/0486aafd/attachment.html>


More information about the Unicode mailing list