Characters that should be displayed?

Konstantin Ritt ritt.ks at gmail.com
Mon Jun 30 10:59:54 CDT 2014


2014-06-29 22:24 GMT+03:00 Asmus Freytag <asmusf at ix.netcom.com>:

> but things get harder the more I think:
>>
>> 3. When the above text says “surrogate code points”, does that mean
>> everything outside BMP? It reads so to me, but I’m surprised that
>> characters in BMP and outside BMP have such differences, so I’m doubting my
>> English skill.
>>
>
> No, those would be supplementary code points. Surrogates are values that
> are intended to be used in pairs as code units in UTF-16. Ill-formed data
> may contain unpaired values, those are referred to as Surrogate code points.
>
>
IIRC, after HTML parsing, validating and building DOM, no any single
surrogate code point could be met in, since presence of any ill-formed data
in the Unicode text makes the whole text ill-formed.
It's a security recommendation to decoders to replace any
unpaired surrogate code point with U+FFFD instead, thus making the text
well-formed. As a side effect, the unpaired surrogate code point becomes
visible (usually as a square box fallback glyph).
What the consideration regarding U+FFFD in CSS?


Konstantin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140630/1781ccc4/attachment.html>


More information about the Unicode mailing list