Is the binaryness/textness of a data format a property?

Richard Wordingham via Unicode unicode at
Sat Mar 21 19:31:31 CDT 2020

On Sat, 21 Mar 2020 13:33:18 -0600
Doug Ewell via Unicode <unicode at> wrote:

> Eli Zaretskii wrote:

> > Emacs uses some of that for supporting charsets that cannot be
> > mapped into Unicode.  GB18030 is one example of such charsets.  The
> > internal representation of characters in Emacs is UTF-8, so it uses
> > 5-byte UTF-8 like sequences to represent such characters.  

> When 137,468 private-use characters aren't enough?

But they aren't private use!  I haven't made any agreement with anyone
about using them.

Additionally, just as some people seem to think that stray UTF-16 code
units should be supported (and occasionally declaring UTF-8
implementations of Unicode standard algorithms to be automatically
non-compliant), there is a case for supporting stray UTF-8 code units.
Emacs supports the full range of 8-bit byte values - 128 unified with
ASCII and the other 128 with high bit set.

> What characters exist in GB18030 that don't
> exist in Unicode, and have they been proposed for Unicode yet, and
> why was none of the PUA space considered appropriate for that in the
> meantime?

Doesn't GB18030 appropriate some of the PUA for Tibetan (and quite
possibly other complex scripts)?  I haven't looked up how Emacs
handles this. 


More information about the Unicode mailing list