Characters that should be displayed?

Shawn Steele Shawn.Steele at microsoft.com
Sun Jun 29 13:59:01 CDT 2014


If the concern is security, I cannot imagine why CSS would even want something like BELL to be legal at all.  

I'm not sure that replacement glyphs would help much.  I mean would someone thing that �Shawn was something spoofing Shawn, or just assume their browser/computer had a rendering glitch?  I think most people would just ignore the unexpected character and assume something was quirky with the web page.

-Shawn

-----Original Message-----
From: Unicode [mailto:unicode-bounces at unicode.org] On Behalf Of Koji Ishii
Sent: Sunday, June 29, 2014 11:44 AM
To: Unicode Mailing List
Subject: Characters that should be displayed?

Hello Unicoders,

I’m a co-editor of CSS Text Level 3[1], and I would appreciate your support in defining rendering behavior in CSS.

The spec currently has the following text[2]:

> Control characters (Unicode class Cc) other than tab (U+0009), line feed (U+000A), and carriage return (U+000D) are ignored for the purpose of rendering. (As required by [UNICODE], unsupported Default_ignorable characters must also be ignored for rendering.)

and there’s a feedback saying that CSS should display visible glyphs for these control characters. Since all major browsers do not display them today, this is a breaking-change and the CSS WG needs to discuss on this feedback. But the WG would appreciate to understand what Unicode recommends.

I found the following text in Unicode 6.3, p. 185, "5.21 Ignoring Characters in Processing”[3]:

> Surrogate code points, private-use characters, and control characters are not given the Default_Ignorable_Code_Point property. To avoid security problems, such characters or code points, when not interpreted and not displayable by normal rendering, should be displayed in fallback rendering with a fallback glyph

By looking at this, my questions are as follows:

1. Should control characters that browsers do not interpret be displayed in fallback rendering?
2. Should private-use characters (U+E000-F8FF, 0F0000-0FFFFD, 100000-10FFFD) without glyphs be displayed in fallback rendering?

These two questions are probably yes from what I understand the text quoted above, but things get harder the more I think:

3. When the above text says “surrogate code points”, does that mean everything outside BMP? It reads so to me, but I’m surprised that characters in BMP and outside BMP have such differences, so I’m doubting my English skill.
4. Should every code point that are not given the Default_Ignorable_Code_Point property and that without interpretations nor glyphs displayed in fallback rendering? I could not find such statement in Unicode spec, but there are some people who believe so.
5. Is there anything else Unicode recommends to display in fallback rendering, or not to display? This must be RTFM, but pointing out where to read would be appreciated.

Thank you for your support in advance.

[1] http://dev.w3.org/csswg/css-text/
[2] http://dev.w3.org/csswg/css-text/#white-space-processing
[3] http://www.unicode.org/versions/Unicode6.3.0/ch05.pdf

/koji


_______________________________________________
Unicode mailing list
Unicode at unicode.org
http://unicode.org/mailman/listinfo/unicode



More information about the Unicode mailing list