non-breaking snakes

Philippe Verdy verdy_p at
Fri May 6 12:24:12 CDT 2016

My opjion is that the choice of graphics for these fillers is just a matter
of style. A single filler (format control) would be enough to encode
(simplying later the text handling in order to ignore them for plain text
searches or collation). These fillers are only made for specific text
layouts with specific fonts at specific sizes, the number of actual
symbols/graphics you would need is unpredictable in all other cases.

The format control would only be used to mark where these fillers are
safely insertable automatically (just like SHY marks).

The situation however would be different if these marks are also used as
bases for holding diacritics (this is the case of the Arabic Tatweel). But
using CGJ (or some other control with combining class 0) is generally
enough to mark their separation from the base letter to which they would
normally attach. The diacritic will be positioned relative to this
zero-width CGJ, above or below.

But CGJ itself is not freely "extensible" in width for line justification.
So the encoding would be <CGJ, diacritics, FILLER> if you want all
diacritics to remain attached located to the start side of the filler. If
the diacritics should come at the end side of the filler, they would be
encoded as <FILLER, diacritics>. In summary that FILLER would be just
another form for CGJ, except that it is extensible like whitespaces for
line justification purpose. Also the FILLER would not necessarily hold
diacritics and could be used alone, even without letters on either sides of

The Arabic Tatweel is behaving mosly like CGJ (diacritics are normally
rendered on the start side of the filler, but there are some cases where
the Arabic diacritics are centered on the filler: it behaves more like a
normal letter for rendering, even if it's ignorable for plain-text
searches, and may not be rendered at all if there's no need to justify
lines or diacritics may still fit around the base letter before it or even
in its normal position with that base letter).

2016-05-06 17:21 GMT+02:00 Marcel Schneider <charupdate at>:

> On Wed, 4 May 2016 08:27:55 +0100, Richard Wordingham  wrote:
> > On Wed, 4 May 2016 07:54:48 +0100 (BST)
> > Julian Bradfield  wrote:
> >
> > > See
> > >
> > > (making sure to look at the mouse-over text)
> >
> > I though kashida (TATWEEL) was a precedent not to be followed. The
> > issue of course, is that chained snakes do not reflow well, just as
> > filler text doesn't.
> On Wed, 4 May 2016 13:15:08 +0200, Philippe Verdy  wrote:
> > Those "snakes" do exist in Arabic for justification purpose (they are
> > formatting controls insertable between pairs of joined letters and
> possibly
> > used as base holders for diacritics).
> >
> > […]
> On Wed, 4 May 2016 09:59:04 -0300, Leonardo Boiko  wrote:
> > 2016-05-04 4:14 GMT-03:00 Shriramana Sharma :
> > > Isn't there some Japanese orthography feature that already does
> > > something like this?
> >
> > […] In fact, most kinds of Japanese calligraphy prize
> > variation in line length, not uniformity. […]
> On Wed, 04 May 2016 07:29:20 -0700, Doug Ewell  wrote:
> > 1F40D FE0F
> >
> > The VS just makes extra, extra sure that it’s emoji.
> Hmm… I guess the principle of diversity should then
> allow for other long animals too: various caterpillars,
> squirrel running on a branch…
> More seriously, if animal pictographs are downgraded
> to mere line-fillers, Iʼm not sure whether the text style
> variation selector U+FE0E would not be a good choice.
> Why not tackle it the other way around: standardize
> sequences of U+2012..U+2015, U+2E3A with some of
> the other ~250 variation selectors to make them look
> like extensible vegetal or animal ornaments. Or simply
> chain the VSes with repeated U+002D.
> If there were a vote, Iʼd prefer word-break in scripts
> that allow for, in case justification is really required
> (to make a hieratic look); or in scripts that cannot break
> words, as Hebrew, using the letter extension mechanisms.
> As of letter spacing, abusing it for justifiction purposes
> is current in some languages but is not semantically neutral
> —TUS recalls—in others that may be very close geographically.
> What helps making a proper layout on one side of the Rhine,
> is yelling on the other.
> So yes, then abusing emoji is the lesser evil   :)
> Marcel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Unicode mailing list