table 3-6. UTF-8 bit distribution

Giacomo Catenazzi cate at cateee.net
Fri Sep 12 01:59:46 CDT 2025


On 2025-09-11 22:58, Dominikus Dittes Scherkl via Unicode wrote:
> Am 11.09.25 um 21:21 schrieb yitin--- via Unicode:
>> https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-3/#G27288 
>>
>>
>> What is the significance of using different letters (x,y,z,u)
>> for different bits?
>
> Significance? None.
> This is simply to enhance the Readability: shows witch bits of the 
> encoding represent witch bits of the scalar value.

Wikipedia uses a similar table, but using codepoint (so not showing 
"distribution", OTOH more consistent with next table).

About the significance: Check the table above, and you see: for UTF-16 
we need to specify groups (we need to subtract 1 in one group), so for 
consistency it is good to have the same notation also on UTF-8.


About readability: I'm impressed that UTF-8 is fully described in two 
lines and two tables. Considering the complexities

Note: nowhere we say about bit ordering, so the yyy xxx may help. If we 
want to be precise, (and now with new format and technologies [on real 
print may be ugly: or too small or too messy], it may be easier), we may 
but subscripts (0 to 19).

My comment is just about "distribution". Is it really the best term to use?

cate





More information about the Unicode mailing list