IJ with accent

Richard Wordingham richard.wordingham at ntlworld.com
Wed Sep 28 17:39:54 CDT 2016


On Wed, 28 Sep 2016 23:22:34 +0200
Philippe Verdy <verdy_p at wanadoo.fr> wrote:

> 2016-09-28 22:48 GMT+02:00 Richard Wordingham <
> richard.wordingham at ntlworld.com>:
> 
> > On Wed, 28 Sep 2016 12:30:04 -0700
> > "Doug Ewell" <doug at ewellic.org> wrote:
> >
> > > > Technically I see one, as bíj́na shound never break between í
> > > > and j́,
> > >
> > > These wor-
> > > ds should not bre-
> > > ak at the places wh-
> > > ere I have broken t-
> > > hem
> > >
> > > but they don't need embedded control characters to enforce that.
> >
> > Indeed, there aren't any control characters to control hyphenation.
> > Indeed, CGJ between default grapheme clusters is often a very good
> > place to hyphenate.
> >
> 
> Who told about CGJ ?
> 
> But zero-width joiners should prevent such undesired breaking ; the
> legacy ZWNBSP however does not suggest any ligature but instead will
> prevent it, by only gluing two grapheme clusters side by side (with
> just kerning enabled), but without altering these glyphs (like in the
> capital IJ ligature whose I is shortened and placed on top of the
> left arm of the J when using ligaturing joiners).

If you could be bothered to read the Unicode standard annexes and the
character database (UCD), you would note that ZWJ (let alone ZWNJ) has
no effect on line-breaking, except with emoji and ideographs.  In
addition to the UCD, a statement to this effect can be found in TUS
23.2 'Layout Controls'. Indeed, the only character that is described as
having an effect on a hyphenator, and that is only described as a
convention (TR14 Line-Breaking, Section 5.4), is U+00AD SOFT HYPHEN.

So far as Unicode is concerned, there is no other plain text control
over hyphenators.

> In South-Est Asian scripts there are such cases to create complex
> clusters that also carry semantic distinctions and layout
> restrictions.

The only semantic distinction available is the forcing of word
boundaries.

Richard.




More information about the Unicode mailing list