Counting Codepoints

Richard Wordingham richard.wordingham at
Mon Oct 12 15:23:09 CDT 2015

On Mon, 12 Oct 2015 17:29:13 +0200
Philippe Verdy <verdy_p at> wrote:

> But between two implementations
> the result of the scanner could still be different because the
> replacement character is not specified. If that result "sanitized"
> string is then used to generate an URI, the URI is also unpredictable
> and will vary between implementations, as well as its effective
> length. If it is used to generate an identifier granting some new
> access, such as a user name, several new user names could be
> generated from the same input.

TUS 8.0 Section 3 Requirement C10 has the following, wise words in its
final paragraph:

"However, such repair of mangled data is a special case, and it must
not be used in circumstances where it would cause security problems."


More information about the Unicode mailing list