Why Work at Encoding Level?
richard.wordingham at ntlworld.com
Mon Oct 19 15:34:01 CDT 2015
On Mon, 19 Oct 2015 21:35:16 +0200
Philippe Verdy <verdy_p at wanadoo.fr> wrote:
> 2015-10-19 20:53 GMT+02:00 Richard Wordingham <
> richard.wordingham at ntlworld.com>:
> > The word
> > 'codepoint' is even worse, as a supplementary plane codepoint is
> > represented by two BMP codepoints.
> No ! The "supplementary code points" (or "supplementary characters"
> when they are assigned to characters) are represented in UTF-16 as
> two **code units**, NOT as two "code points" (even if their binary
> value are related).
A code point is 'any value in the Unicode codespace' (TUS Section 3.4
D10). The 'Unicode codespace' is a range of integers from 0 to
0x10FFFF (TUS Section 3.4 D9).
This works fine so long as one thinks of a 'code point' as just a
number. The problem is that people rarely use the term 'scalar values'.
More information about the Unicode