Ancient Greek apostrophe marking elision

James Kass via Unicode unicode at unicode.org
Sun Jan 27 21:48:52 CST 2019


On 2019-01-27 11:38 PM, Richard Wordingham via Unicode wrote:
> On Sun, 27 Jan 2019 19:57:37 +0000
> James Kass via Unicode <unicode at unicode.org> wrote:
>
>> On 2019-01-27 7:09 PM, James Tauber via Unicode wrote:
>>> In my original post, I asked if a language-specific tailoring of
>>> the text segmentation algorithm was the solution but no one here
>>> has agreed so far.
>> If there are likely to be many languages requiring exceptions to the
>> segmentation algorithm wrt U+2019, then perhaps it would be better to
>> establish conventions using ZWJ/ZWNJ and adjust the algorithm
>> accordingly so that it would be cross-languages.  (Rather than
>> requiring additional and open ended language-specific tailorings.) (I
>> inserted several combinations of ZWJ/ZWNJ into James Tauber's
>> example, but couldn't improve the segmentation in LibreOffice,
>> although it was possible to make it worse.)
> If you look at TR29, you will see that ZWJ should only affect word
> boundaries for emoji.  ZWNJ shall have no effect.  What you want is a
> control that joins words, but we don't have that.
>
> Richard.
>

(https://unicode.org/reports/tr29/)

It’s been said that the text segmentation rules seem over-complicated 
and are probably non-trivial to implement properly.  I tried your 
suggestion of WORD JOINER U+2060 after tau ( γένοιτ⁠’ ἄν ), but it only 
added yet another word break in LibreOffice.

The problem may stem from the fact that WORD JOINER is supposed to be 
treated as though it were a zero-width no-break space.  IOW it is a 
*space*, and as a space it indicates a word break.  That doesn’t seem right.

Instead of treating WORD JOINER as a SPACE, why not treat it as a WORD 
JOINER?  It could save a lot of problems wrt undesirable string 
segmentation in addition to possibly minimizing future language-specific 
tailoring and easing the burden on implementers.



More information about the Unicode mailing list