get the sourcecode [of UTF-8]

Jim Breen jimbreen at gmail.com
Fri Nov 8 00:04:37 CST 2024


On Fri, 8 Nov 2024 at 11:37, Markus Scherer <markus.icu at gmail.com> wrote:
> On Thu, Nov 7, 2024 at 3:03 PM Jim Breen via Unicode <unicode at corp.unicode.org> wrote:
>>
>> On rare occasions, I need to dig into UTF-8 at the bit level. I have a
>> note pinned near my desk as an aide memoire. It has 3 lines:
>>
>> UTF-8
>> zzzzyyyyyxxxxx
>> 1110zzzz 10yyyyyy 10xxxxxx
>
> 11110nnn 10zzzzzz 10yyyyyy 10xxxxxx

I haven't had any occasion to poke around at 21-bit Unicode
codepoints. The JIS standards only have 303 kanji with them; all added
in the JIS X 0213 standard introduced in 2000.

[As I wrote in my "A Brief History of Japanese Character Set
Standards" (https://www.edrdg.org/~jwb/paperdir/kanjicomp.html) "the
main lasting impact of the JIS X 0213 standard will probably be the
additional 303 kanji it contributed to Unicode."]

Jim

-- 
Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University
http://www.jimbreen.org/
http://nihongo.monash.edu/



More information about the Unicode mailing list