proposal for new character 'soft/preferred line break'

Jukka K. Korpela jkorpela at cs.tut.fi
Mon Feb 10 01:53:37 CST 2014


2014-02-10 9:13, Philippe Verdy wrote:

> The <wbr> is enough for this purpose,

No, since the purpose was clearly to specify a line break point that is 
preferred over other possible line break points, or even the only 
allowed line break point within a string.

The <wbr> tag (an old nonstandard tag, now being standardized in HTML5) 
would not have been needed if browsers had supported U+200B. It is 
nowadays debatable which one should be used (U+200B has the disadvantage 
of not being supported by IE 6, a still somewhat significant point). But 
in any case, they are for allowing direct line break points, nothing more.

> A browser could even use them to give higher priority to break lines,

That would be rather arbitrary and won’t happen; there is no good reason 
for that.

> What you want is just to hint the line breaker in the renderer on where
> the linebreaks are the best beneficial. This is really something that
> does not belong to plain text, but to the presentation layer, and HTML
> for example is reach enough about such presentation layer

In rendering software, the choice between line break opportunities is 
usually either a very simple one (put as many characters on a line as 
possible) or a complicated layout decision that tries to optimize the 
spacing between words at a paragraph level. I don’t think there is much 
room for any layout instructions at any layer, beyond interactive fine 
tuning where a human user instructs the problem to split at specific 
point and sees what happens, or prevents a specific break. 
Theoretically, it is an interesting idea to consider control characters 
or markup for line break opportunities with different preferability, but 
in practice, it would be too complicated as compared with the possible gain.

> In my opinion the encced SHY character is there only for legacy reasons
> (compatibility with older encodings when renderers had no good option to
> break words. But in HTML SHY is not needed and <wbr> will work better.

They are completely different things. You might be confusing <wbr> with 
­ (which is just a named reference for SHY, useful when you want it 
to be visible in source code).

Yucca






More information about the Unicode mailing list