Counting Codepoints
Richard Wordingham
richard.wordingham at ntlworld.com
Tue Oct 13 13:53:29 CDT 2015
On Tue, 13 Oct 2015 15:23:36 +0000
David Starner <prosfilaes at gmail.com> wrote:
> A UTF-16 string could delete one surrogate, or add a fractional
> character. A Unicode string (not a "UTF-16 string"), which could be
> stored internally in, say, a Python-like format which is Latin-1,
> UCS-2, or UTF-32, conversions made as needed and differences hidden
> from the user, can't.
Confusingly, the Unicode definitions are the other way round. A
UTF-16 string is a string of UTF-16 codepoints in which all surrogate
characters are paired surrogates. Any string of UTF-15 code units may
is a Unicode 16-bit string.
Richard.
More information about the Unicode
mailing list