Custom characters (was: Re: Private Use Area in Use)

Chris idou747 at
Thu Jun 4 02:43:48 CDT 2015

> Well, that's the rub, isn't it?
> We (in IT) are still working pretty dang hard on the simpler problem, to wit:
> There should be a way to put standard characters anywhere that characters belong
> and have things "just work".
> And even *that* is a hard problem that has taken over 25 years -- and is still a work in
> progress.

Unicode is 2 things. (1) A binary format… the technology bit. (2) And the social part: agreeing what the characters should be.

(1) is, relatively speaking, super easy. Roughly speaking, 16 bit unique numbers in a row.  (2) is hard because coming to an agreement is hard.

What I’m saying is we can totally bypass (2) for many use cases if people had the power to make their own characters. Yes it is hard to meet in committee and agree on stuff. Don’t force people to do that. You do that by putting more work into (1), and less hand wringing about (2).

> See, the first barrier to getting anywhere with this goal is to get everybody concerned
> with text in IT (or perhaps even worse, all the hundreds of millions of people who
> *use* characters in their devices) to agree what a "custom character" is.

There is no need for such thing. Everybody knows roughly what the concept of a custom character is. What is needed is the technology to do it so that everyone can seamlessly enjoy it.

> And if
> the rollicking "discussions" underway about emoji have taught us much of anything,
> it includes the fact that people do *not* all agree about what characters are or
> what should be a candidate for "just working" -- or even what "just work" might
> mean for them, in any case.

That’s because you’re immersed in (2), which is a different kind of problem. You don’t have to agree on details if everybody has the power to create new characters.

> So before declaring that your position is self-evidently correct about how things
> should just work, it might be a good idea to put some real thought into how
> one would define and standardize the concept of a "custom character" sufficiently
> precisely that there would be a snowball's chance in hell that all the implementations
> of text out there would a) know what it was, b) know how it should display and
> render, c) know how it should be input, stored, and transmitted and d) know how it 
> should be interpreted universally.

I already gave several possible implementation suggestions. I’ll repeat one of them again merely to illustrate that it is possible.

Characters are 64 bit. 32 bits are stripped off as the “character set provider ID”. That is sent to one of many canonical servers akin to DNS servers to find the URL owner of those characters. At that location you’d find a number of representations of the character whether TrueType, vector graphics, bitmaps or whatever. The rendering engine would download the representation and display it to the user. All without the user having to know anything about character sets, custom fonts or whatever.

So you come across character 12340000000017. The OS asks charset server who owns charset 1234. They reply “”. The OS asks for representation.

All this happens invisible to the user. Of course if it is already cached on their machine, then it wouldn’t happen.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Unicode mailing list