Why Work at Encoding Level?

Richard Wordingham richard.wordingham at ntlworld.com
Mon Oct 19 15:34:01 CDT 2015


On Mon, 19 Oct 2015 21:35:16 +0200
Philippe Verdy <verdy_p at wanadoo.fr> wrote:

> 2015-10-19 20:53 GMT+02:00 Richard Wordingham <
> richard.wordingham at ntlworld.com>:

> > The word
> > 'codepoint' is even worse, as a supplementary plane codepoint is
> > represented by two BMP codepoints.

> No ! The "supplementary code points" (or "supplementary characters"
> when they are assigned to characters) are represented in UTF-16 as
> two **code units**, NOT as two "code points" (even if their binary
> value are related).

A code point is 'any value in the Unicode codespace' (TUS Section 3.4
D10). The 'Unicode codespace' is a range of integers from 0 to
0x10FFFF (TUS Section 3.4 D9).

This works fine so long as one thinks of a 'code point' as just a
number.  The problem is that people rarely use the term 'scalar values'.

Richard.


More information about the Unicode mailing list