Unicode String Models
Eli Zaretskii via Unicode
unicode at unicode.org
Tue Sep 11 12:21:07 CDT 2018
> From: Hans Åberg <haberg-1 at telia.com>
> Date: Tue, 11 Sep 2018 19:13:28 +0200
> Cc: Henri Sivonen <hsivonen at hsivonen.fi>,
> unicode at unicode.org
> > In Emacs, each raw byte belonging
> > to a byte sequence which is invalid under UTF-8 is represented as a
> > special multibyte sequence. IOW, Emacs's internal representation
> > extends UTF-8 with multibyte sequences it uses to represent raw bytes.
> > This allows mixing stray bytes and valid text in the same buffer,
> > without risking lossy conversions (such as those one gets under model
> > 2 above).
> Can you give a reference detailing this format?
There's no formal description as English text, if that's what you
meant. The comments, macros and functions in the files
src/character.[ch] in the Emacs source tree tell most of that story,
albeit indirectly, and some additional info can be found in the
section "Text Representation" of the Emacs Lisp Reference manual.
More information about the Unicode