Is the binaryness/textness of a data format a property?
Richard Wordingham via Unicode
unicode at unicode.org
Sat Mar 21 19:31:31 CDT 2020
On Sat, 21 Mar 2020 13:33:18 -0600
Doug Ewell via Unicode <unicode at unicode.org> wrote:
> Eli Zaretskii wrote:
> > Emacs uses some of that for supporting charsets that cannot be
> > mapped into Unicode. GB18030 is one example of such charsets. The
> > internal representation of characters in Emacs is UTF-8, so it uses
> > 5-byte UTF-8 like sequences to represent such characters.
> When 137,468 private-use characters aren't enough?
But they aren't private use! I haven't made any agreement with anyone
about using them.
Additionally, just as some people seem to think that stray UTF-16 code
units should be supported (and occasionally declaring UTF-8
implementations of Unicode standard algorithms to be automatically
non-compliant), there is a case for supporting stray UTF-8 code units.
Emacs supports the full range of 8-bit byte values - 128 unified with
ASCII and the other 128 with high bit set.
> What characters exist in GB18030 that don't
> exist in Unicode, and have they been proposed for Unicode yet, and
> why was none of the PUA space considered appropriate for that in the
> meantime?
Doesn't GB18030 appropriate some of the PUA for Tibetan (and quite
possibly other complex scripts)? I haven't looked up how Emacs
handles this.
Richard.
More information about the Unicode
mailing list