Surrogates and noncharacters

Philippe Verdy verdy_p at
Sat May 9 08:11:51 CDT 2015

2015-05-09 11:59 GMT+02:00 Richard Wordingham <
richard.wordingham at>:

> No, D82 merely requires that each 16-bit value be a valid UTF-16 code
> unit.  Unicode strings, and Unicode 16-bit strings in particular, need
> not be well-formed.  For x = 8, 16, 32, a 'UTF-x string', equivalently a
> 'valid UTF-x string', is one that is well-formed in UTF-x.
> > I was right, You and Richard were wrong.
> I stand by my explanation.  I wrote it with TUS open at the definitions
> by my side.

Except that you are explaining something else. You are speaking about
"Unicode strings" which are bound to a given UTF, I was speaking ONLY about
"16-bit strings" which were NOT bound to Unicode (and did not have to). So
TUS is compeltely not relevant here I have NOT written "Unicode 16-bit
strings", only "16-bit strings" and I clearly opposed the two DISTINCT
concepts in the SAME sentence so that no confusion was possible.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Unicode mailing list