Unicode Regular Expressions, Surrogate Points and UTF-8
Markus Scherer
markus.icu at gmail.com
Sat May 31 21:28:27 CDT 2014
On Sat, May 31, 2014 at 1:59 AM, Richard Wordingham <
richard.wordingham at ntlworld.com> wrote:
> Bear in mind that a pattern \uD808 shall not match anything in a
> well-formed Unicode string.
Depends. See the definitions of Unicode strings vs. UTF strings.
\uD808\uDF45 specifies a sequence of two
> codepoints.
Implementations that use Unicode 16-bit strings will usually treat this as
one supplementary code point.
In Java, there is no other way to escape one.
markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/unicode/attachments/20140531/e53770e7/attachment.html>
More information about the Unicode
mailing list