Is the binaryness/textness of a data format a property?

Eli Zaretskii via Unicode unicode at
Sat Mar 21 15:26:24 CDT 2020

> From: "Doug Ewell" <doug at>
> Cc: <unicode at>
> Date: Sat, 21 Mar 2020 13:33:18 -0600
> > Emacs uses some of that for supporting charsets that cannot be mapped
> > into Unicode.  GB18030 is one example of such charsets.  The internal
> > representation of characters in Emacs is UTF-8, so it uses 5-byte
> > UTF-8 like sequences to represent such characters.
> When 137,468 private-use characters aren't enough?

Why is that relevant to the issue at hand?

> I thought the whole premise of GB18030 was that it was Unicode mapped into a GB2312 framework. What characters exist in GB18030 that don't exist in Unicode, and have they been proposed for Unicode yet

I don't remember off hand, but last time I looked at GB18030, there
were a lot of them not in Unicode.

> and why was none of the PUA space considered appropriate for that in the meantime?

Because many fonts already use them?  I don't really know why it was
decided to use codepoints above 0x1FFFFF, it's just that this is how
Emacs works for quite some time.  You asked for examples of usage, and
I provided one.

More information about the Unicode mailing list