table 3-6. UTF-8 bit distribution

Jim DeLaHunt list+unicode at jdlh.com
Thu Sep 11 15:49:04 CDT 2025


On 2025-09-11 12:21, yitin--- via Unicode wrote:

> https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-3/#G27288 
>
>
> What is the significance of using different letters (x,y,z,u)
> for different bits?  I don't see any consistent pattern in
> the naming.  https://www.rfc-editor.org/rfc/rfc3629 just
> uses x for all of them.

What I like about Table 3-6's notation is that it shows how the bits in 
the various code units (x,y,z,u) correspond to the bits in the scalar 
value.  See for example, the final scalar value:

> 000uuuuu zzzzyyyy yyxxxxxx
The right-hand part of that row shows that the 'u' bits are encoded in 
the first and second bytes, the 'z' bits are encoded in the second byte, 
the 'y' bits are encoded in the third byte, the 'x' bits are encoded in 
the fourth byte.

The table in section 3 of RFC3629 just shows ranges of scalar values, 
not the bit patterns within the scalar values. Thus it does not 
illustrate as much as the Core Spec illustrates.

That's just my humble opinion as a reader. This being the Unicode list, 
the original author might be present, and give their rationale.

Best regards,
        —Jim DeLaHunt

-- 
.   --Jim DeLaHunt, jdlh at jdlh.com     http://blog.jdlh.com/ (http://jdlh.com/)
       multilingual websites consultant, Vancouver, B.C., Canada



More information about the Unicode mailing list