Is the binaryness/textness of a data format a property?

Eli Zaretskii via Unicode unicode at unicode.org
Sat Mar 21 14:23:45 CDT 2020


> Date: Sat, 21 Mar 2020 11:13:40 -0600
> From: Doug Ewell via Unicode <unicode at unicode.org>
> 
> Adam Borowski wrote:
> 
> > Also, UTF-8 can carry more than Unicode -- for example, U+D800..U+DFFF
> > or U+11000..U+7FFFFFFF (or possibly even up to 2³⁶ or 2⁴²), which has
> > its uses but is not well-formed Unicode.
> 
> I'd be interested in your elaboration on what these uses are.

Emacs uses some of that for supporting charsets that cannot be mapped
into Unicode.  GB18030 is one example of such charsets.  The internal
representation of characters in Emacs is UTF-8, so it uses 5-byte
UTF-8 like sequences to represent such characters.


More information about the Unicode mailing list