Tag characters and in-line graphics (from Tag characters)

Fri Jun 5 05:46:10 CDT 2015

On 6/4/2015 17:03 , "Chris" wrote:
> This whole discussion is about the fact that it would be technically 
> possible to have private character sets and private agreements that 
> your OS downloads without the user being aware of it. 

The sticky issues are not the questions of how to make available fonts 
or images for use by the OS.

Instead, they concern the fact that any such  a model violates some 
pretty basic guarantees of plain text that the entire net infrastructure 
relies on.

There are very obvious security issues. The start with tracking; every 
time you access a custom code point, that fact potentially results in a 
trackable interaction. This problem affects even the "sticker" solution 
that people are hoping for for emoji. (On my system, no external 
resources are displayed when I first open any message, and there is a 
reason for that).

Beyond tracking, and beyond stickers (that is pictures that look like 
pictures) a generalized custom character set would allow "text" that is 
no longer really stable. You would be able to deliver identical e-mails 
to people that display differently, because when you serve the custom 
fonts, you would be able to customize what you deliver under the same 
custom character set designator.

While this would be a wonderful way to circumvent censorship (other than 
the "man in the middle" version), you would likewise seriously undermine 
the ability to filter unwanted or undesirable texts, because the custom 
character set engine might recognize when a request comes from a filter 
and not the end user. (Just the other day, I came across a hacked 
website that responded differently to search engined than to live users, 
making the hack effective for one and invisible to the other. Custom 
character sets would seem to just add to the hackers' arsenal here).

Finally, custom character sets sound like a great idea when thinking of 
an extension of an existing character set. But that's not where the 
issues are. The issues come in when you use the same technology to 
provide aliases for existing code points or for other custom characters.

Aliasing undermines the ability to do search (or any other 
content-focused processing, from sorting to spell-check).

At that point, the circle closes.

When Unicode was created, the alternative then was ISO 2022, which was a 
standard that addressed the issue of how to switch among (albeit 
pre-defined) character sets to achieve, in principle, coverage equal to 
the union of these character sets.

Unicode was created to address two main deficiencies of that situation. 
Unification addressed the aliasing issue, so that code points were no 
longer "opaque" but could be interpreted by software (other than 
display), which was the second big drawback of the patchwork of 
character sets. A processing model for opaque code points is possible to 
define, but it isn't very practical and in the late eighties people had 
had enough were glad to be quit of it.

Seen from this perspective, the discussion about custom character sets 
presents itself as a giant step backward, undermining the very advances 
that underlie the rapid acceptance and spread of Unicode.

A./
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20150605/f94a2089/attachment.html>