proposal for new character 'soft/preferred line break'
Jukka K. Korpela
jkorpela at cs.tut.fi
Wed Feb 5 12:35:59 CST 2014
2014-02-05 18:22, Markus Scherer wrote:
> On Tue, Feb 4, 2014 at 2:25 PM, Rhavin Grobert <rhavin at shadowtec.de
> <mailto:rhavin at shadowtec.de>> wrote:
>
> Parallel to soft hyphen, a hyphen that is just inserted if the word
> was broken, it would be practical to have some way to tell browser:
> if you need to break the line, try here first. This would be really
> usefull for poems, music lines, adresses,…
>
>
> That would be HTML <wbr> <http://dev.w3.org/html5/markup/wbr.html> or
> U+200B ZERO WIDTH SPACE
As a suggested direct line break point, they both work fine, with few
caveats though, making it a bit difficult to decide which one is better,
see my treatise
http://www.cs.tut.fi/~jkorpela/html/nobr.html#suggest
In plain text, of course, U+200B is the way. The main problem with it is
that some software, including some old browsers like IE 6, do not
recognize it but try to render it as a graphic character, possibly using
a font that has no glyph for it. Adding a new character would not help
here at all, of course.
> And it would be really easy to implement: there is no visual
> representation needed and if the right code-point is choosen, it
> would be downward-compatible to all systems not knowing of the new
> character.
>
> Unlikely.
Indeed, there is no reason to expect old software to silently ignore
characters that they do not recognize. Whatever the Unicode Standard
might say, old software just does what it has been programmed to do, and
this may well be “here’s a character for which I have no special rule,
so I’ll use whatever is available in the font(s) I’m using”, typically
resulting in a small rectangle that represents a character for which no
glyph is available.
But I’m not quite sure of the idea of the suggestion. If the idea is to
provide an optional break point, in a position where none would normally
not be present, then U+200B is the way. Not 100% reliable, but better
than anything else (in plain text).
But if the idea is to suggest that among permissible line break points,
this one is preferable, then it’s a different issue. Theoretically
interesting, but in practical terms, things don’t work that way. In
practice, there are permissible line break points (either by implicit
rules that e.g. normally allow a break after a space, or by explicit
indication by U+200B). Programs will take it from there, and if they do
some optimization, like good publishing software does, they typically
optimize the division of an entire paragraph into lines, applying
several criteria.
Yucca
More information about the Unicode
mailing list