get the sourcecode [of UTF-8]
Julian Bradfield
junicode at jcbradfield.org
Tue Nov 5 04:28:46 CST 2024
On 2024-11-05, A bughunter via Unicode <unicode at corp.unicode.org> wrote:
> Originating Question
>
> Where to get the sourcecode of relevent (version) UTF-8?: in order to checksum
> text against the specific encoding map (codepage).
As people keep telling you, this is a nonsense question.
UTF-8 does not have sourcecode. UTF-8 is a function from streams of
octets to streams of codepoints and vice versa. It is specified very
simply, and there are many reference implementations (including the
one posted here).
There are no codepages in Unicode. (Or I suppose there is exactly
one.)
There have been a couple of specification changes in the history of
UTF-8; the last one was in 2003. So it's unlikely you ever need to
consider previous versions, in which certain now forbidden codepoints
are allowed to appear.
More information about the Unicode
mailing list