Characters that should be displayed?
ritt.ks at gmail.com
Mon Jun 30 10:59:54 CDT 2014
2014-06-29 22:24 GMT+03:00 Asmus Freytag <asmusf at ix.netcom.com>:
> but things get harder the more I think:
>> 3. When the above text says “surrogate code points”, does that mean
>> everything outside BMP? It reads so to me, but I’m surprised that
>> characters in BMP and outside BMP have such differences, so I’m doubting my
>> English skill.
> No, those would be supplementary code points. Surrogates are values that
> are intended to be used in pairs as code units in UTF-16. Ill-formed data
> may contain unpaired values, those are referred to as Surrogate code points.
IIRC, after HTML parsing, validating and building DOM, no any single
surrogate code point could be met in, since presence of any ill-formed data
in the Unicode text makes the whole text ill-formed.
It's a security recommendation to decoders to replace any
unpaired surrogate code point with U+FFFD instead, thus making the text
well-formed. As a side effect, the unpaired surrogate code point becomes
visible (usually as a square box fallback glyph).
What the consideration regarding U+FFFD in CSS?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode