proposal for new character 'soft/preferred line break'

Jukka K. Korpela jkorpela at cs.tut.fi
Wed Feb 5 12:35:59 CST 2014


2014-02-05 18:22, Markus Scherer wrote:

> On Tue, Feb 4, 2014 at 2:25 PM, Rhavin Grobert <rhavin at shadowtec.de
> <mailto:rhavin at shadowtec.de>> wrote:
>
>     Parallel to soft hyphen, a hyphen that is just inserted if the word
>     was broken, it would be practical to have some way to tell browser:
>     if you need to break the line, try here first. This would be really
>     usefull for poems, music lines, adresses,…
>
>
> That would be HTML <wbr> <http://dev.w3.org/html5/markup/wbr.html> or
> U+200B ZERO WIDTH SPACE

As a suggested direct line break point, they both work fine, with few 
caveats though, making it a bit difficult to decide which one is better, 
see my treatise
http://www.cs.tut.fi/~jkorpela/html/nobr.html#suggest

In plain text, of course, U+200B is the way. The main problem with it is 
that some software, including some old browsers like IE 6, do not 
recognize it but try to render it as a graphic character, possibly using 
a font that has no glyph for it. Adding a new character would not help 
here at all, of course.

>     And it would be really easy to implement: there is no visual
>     representation needed and if the right code-point is choosen, it
>     would be downward-compatible to all systems not knowing of the new
>     character.
>
> Unlikely.

Indeed, there is no reason to expect old software to silently ignore 
characters that they do not recognize. Whatever the Unicode Standard 
might say, old software just does what it has been programmed to do, and 
this may well be “here’s a character for which I have no special rule, 
so I’ll use whatever is available in the font(s) I’m using”, typically 
resulting in a small rectangle that represents a character for which no 
glyph is available.

But I’m not quite sure of the idea of the suggestion. If the idea is to 
provide an optional break point, in a position where none would normally 
not be present, then U+200B is the way. Not 100% reliable, but better 
than anything else (in plain text).

But if the idea is to suggest that among permissible line break points, 
this one is preferable, then it’s a different issue. Theoretically 
interesting, but in practical terms, things don’t work that way. In 
practice, there are permissible line break points (either by implicit 
rules that e.g. normally allow a break after a space, or by explicit 
indication by U+200B). Programs will take it from there, and if they do 
some optimization, like good publishing software does, they typically 
optimize the division of an entire paragraph into lines, applying 
several criteria.

Yucca






More information about the Unicode mailing list