Hyphenation Markup

Richard Wordingham via Unicode unicode at unicode.org
Sat Jun 2 06:37:45 CDT 2018

On Sat, 2 Jun 2018 11:06:43 +0200
Otto Stolz via Unicode <unicode at unicode.org> wrote:

> Am 2018-06-02 um 06:44 schrieb Richard Wordingham via Unicode:
> > In Latin text, one can indicate permissible line break opportunities
> > between grapheme clusters by inserting U+00AD SOFT HYPHEN.  What
> > low-end schemes, if any, exist for such mark-up within grapheme
> > clusters?  
> What about U+200B ZWSP?

> >  this character is intended for invisible word
> > separation and for line break control; it has no
> > width, but its presence between two characters
> > does not prevent increased letter spacing in
> > justification  

Thanks for the suggestion, but it's not likely to work:

Within a word and with a proper layout implementation, using ZWSP
would be worse than using backing store <character-1, SHY,

1) In the sequence

<letter-0, character-1, ZWSP, character-2, letter-1>

realisation of the break should definitely result in <letter-0,
character-1> on one line and in <character-2, letter-1> on the next
line, whereas in visual order, character-2 should precede character-1. 

2) Use of ZWSP will usually result in a dotted circle even when the break does not occur.

3) ZWSP will result in a mandatory word boundary.  That will cause
problems with the spell checker.

I've experimented
(http://wrdingham.co.uk/lanna/renderer_test.htm#test_and_tell) with the
combination <letter, right matra> where there is a default grapheme
cluster boundary between the two characters.  I get generally better
results with SHY than ZWSP.  The downside was that the rendering
systems I tried seemed to insist on inserting the glyph of U+002D or
U+2010, rather than the glyph of U+00AD.

Incidentally, does CLDR define the rendering of soft hyphen, or is one
entirely at the mercy of the application?


More information about the Unicode mailing list