proposal for new character 'soft/preferred line break'

Philippe Verdy verdy_p at wanadoo.fr
Mon Feb 10 14:30:41 CST 2014


2014-02-10 8:53 GMT+01:00 Jukka K. Korpela <jkorpela at cs.tut.fi>:

> They are completely different things. You might be confusing <wbr> with
> ­ (which is just a named reference for SHY, useful when you want it to
> be visible in source code).
>

No I make no confusion: <wbr> is a formatting HTML element, SHY (or ­
in HTML syntax for the defined entity) is a character. Both play equivalent
roles in HTML, except that ­ has a defined behavior to insert an hyphen
at end of broken lines, where <wbr> would adopt a language-dependant
behavior (not all languages use hyphens at end of lines to mark words that
have been split by breaking lines).

I really know that ­ and SHY are synonyms in this context but that
<wbr> is a bit different and is not part of plain-text (notably it will be
filtered out from $(element).innerText, but not &shy

Note that some browsers are resolving the "innerText" property of HTML DOM
elements by parsing the CSS properties, so this property does not really
reflect only the plain-text elements of the document: Chrome for example
does this to remove spans of texts that are hidden, either by display:none,
or display:hidden, or color:transparent, and it transforms <br> elements
into newlines, and detects the boundarty of block-elements (e.g. with
"display:block" or "display:table-cell')  to generate newline characters,
or sometimes tabs. Chome also injects text added by CSS ":before" and
":after" selectors.

The effect of all this is that a browser uses the HTML DOM to still infer
some plain text to return for the innerText element property, and <wbr> may
become a SHY format control (should it?)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140210/99334c6c/attachment.html>


More information about the Unicode mailing list