Unicode String Models

Hans Åberg via Unicode unicode at unicode.org
Tue Sep 11 12:13:28 CDT 2018


> On 11 Sep 2018, at 13:13, Eli Zaretskii via Unicode <unicode at unicode.org> wrote:
> 
> In Emacs, each raw byte belonging
> to a byte sequence which is invalid under UTF-8 is represented as a
> special multibyte sequence.  IOW, Emacs's internal representation
> extends UTF-8 with multibyte sequences it uses to represent raw bytes.
> This allows mixing stray bytes and valid text in the same buffer,
> without risking lossy conversions (such as those one gets under model
> 2 above).

Can you give a reference detailing this format?





More information about the Unicode mailing list