Counting Codepoints

Richard Wordingham richard.wordingham at ntlworld.com
Sun Oct 11 16:20:34 CDT 2015


Is the number of codepoints in a UTF-16 string well defined?

For example, which of the following two statements are true?

(a) The ill-formed three code-unit Unicode 16-bit string <0xDC00,
0xD800, 0xDC20> contains two codepoints, U+DC00 and U+10020.

(b) The ill-formed three code-unit Unicode 16-bit string <0xDC00,
0xD800, 0xDC20> contains three codepoints, U+DC00, U+D800 and U+DC20.

Statement (a) is probably more useful, but I couldn't find anything to
rule that statement (b) is false.

Richard.



More information about the Unicode mailing list