richard.wordingham at ntlworld.com
Sun Oct 11 16:20:34 CDT 2015
Is the number of codepoints in a UTF-16 string well defined?
For example, which of the following two statements are true?
(a) The ill-formed three code-unit Unicode 16-bit string <0xDC00,
0xD800, 0xDC20> contains two codepoints, U+DC00 and U+10020.
(b) The ill-formed three code-unit Unicode 16-bit string <0xDC00,
0xD800, 0xDC20> contains three codepoints, U+DC00, U+D800 and U+DC20.
Statement (a) is probably more useful, but I couldn't find anything to
rule that statement (b) is false.
More information about the Unicode