Unicode Regular Expressions, Surrogate Points and UTF-8
Markus Scherer
markus.icu at gmail.com
Fri May 30 18:15:12 CDT 2014
If you use Unicode 16-bit strings, it's easy to "pass through" unpaired
surrogates and treat them like code points; it's often not productive or
necessary to check for them all the time, that is, to be strict about
UTF-16.
On the other hand, I don't think anyone expects you to support invalid
UTF-8, and especially not to support any and all Unicode 8-bit strings (see
Unicode 3.9 Unicode Encoding Forms for what I mean here).
If you find UTS #18 unclear or misleading, I suggest you submit feedback
pointing out specific text issues.
markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140530/4d5754cd/attachment.html>
More information about the Unicode
mailing list