Characters that should be displayed?

Asmus Freytag asmusf at ix.netcom.com
Sun Jun 29 14:24:05 CDT 2014


On 6/29/2014 11:44 AM, Koji Ishii wrote:
>> Surrogate code points, private-use characters, and control characters are not given the Default_Ignorable_Code_Point property. To avoid security problems, such characters or code points, when not interpreted and not displayable by normal rendering, should be displayed in fallback rendering with a fallback glyph
> By looking at this, my questions are as follows:
>
> 1. Should control characters that browsers do not interpret be displayed in fallback rendering?
> 2. Should private-use characters (U+E000-F8FF, 0F0000-0FFFFD, 100000-10FFFD) without glyphs be displayed in fallback rendering?
>
> These two questions are probably yes from what I understand the text quoted above,

By displaying a fall-back rendering the user is alerted that something 
is present, but normally not visible to the user.

However, these are not the only invisible characters, and many should 
not (must not) be rendered, ever (except in diagnostic modes). So, it is 
a bit unclear to me what precisely this recommendation buys you, as it 
is incomplete.

The recommendation is prefixed with "To avoid security problems,...". If 
this is taken to mean that it should apply in contexts that require 
strict attention to security issues, then they probably define a minimum 
of what should be done, and other measures need to be taken in  addition.

> but things get harder the more I think:
>
> 3. When the above text says “surrogate code points”, does that mean everything outside BMP? It reads so to me, but I’m surprised that characters in BMP and outside BMP have such differences, so I’m doubting my English skill.

No, those would be supplementary code points. Surrogates are values that 
are intended to be used in pairs as code units in UTF-16. Ill-formed 
data may contain unpaired values, those are referred to as Surrogate 
code points.

> 4. Should every code point that are not given the Default_Ignorable_Code_Point property and that without interpretations nor glyphs displayed in fallback rendering? I could not find such statement in Unicode spec, but there are some people who believe so.
> 5. Is there anything else Unicode recommends to display in fallback rendering, or not to display? This must be RTFM, but pointing out where to read would be appreciated.



More information about the Unicode mailing list