"A Programmer's Introduction to Unicode"

Tue Mar 14 11:15:38 CDT 2017

Philippe Verdy wrote:

>>> Well, you do have eleven bits for flags per codepoint, for example.
>>
>> That's not UCS-4; that's a custom encoding.
>>
>> (any UCS-4 code unit) & 0xFFE00000 == 0

(changing to "UTF-32" per Ken's observation)

> Per definition yes, but UTC-4 is not Unicode.

I guess it's not. What is UTC-4, anyway? Another name for a UWG meeting
held in 1989?

> As well (any UCS-4 code unit) & 0xFFE00000 == 0 (i.e. 21 bits) is not
> Unicode, UTF-32 is Unicode (more restrictive than just 21 bits which
> would allow 32 planes instead of just the 17 first ones).

I used bitwise arithmetic strictly to address Steffen's premise that the
11 "unused bits" in a UTF-32 code unit were available to store metadata
about the code point. Of course UTF-32 does not allow 0x110000 through
0x1FFFFF either.

> I suppose he meant 21 bits, not 11 bits which covers only a small part
> of the BMP.

No, his comment "you do have eleven bits for flags per codepoint" pretty
clearly referred to using the "extra" 11 bits beyond what is needed to
hold the Unicode scalar value.

--
Doug Ewell | Thornton, CO, US | ewellic.org