get the sourcecode [of UTF-8]

A bughunter A_bughunter at proton.me
Sun Nov 3 23:43:29 CST 2024


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

No, it does not answer my question.

Yes 1 byte is 8 bits and UTF-8 is Unicode Text Format - 8 bit. Then you give me a manual page which is clearly for Unicode version 16. When I say relevent version, wheather you call it core or not, I anticipate you would ask about what implimentation of UTF-8: the answer is the relevent implimentation is android 13 libbionic (bionic C) which uses UTF-8. 
 Without the sourcecode you could only guess as to which unicode version bionicC uses. With slight assumption, android 13 is open source AOSP and, it would be possible to point out the exact unicode used in it however this assumes my runtime matches a generic AOSP android 13 source. So then the way in which I framed my question does probe as to if there is any way to display the compile time UTF-8. Sometimes there are --version options.
 The part you do not seem to understand is the full circle of authentication of a checksummed text. In order to fully authenticate: the codepage of the character to glyph map must be known. Anything further on this checksumming process would not be directly on topic of this mailing list and you may ask me on the side. Although stating the usecase is worth mentioning.

Jim says: "there are many things which could be called a "specific encoding map (codepage)". I don't know which of those you are referring to.". I called this a "character to glyph map" which entails not only the "bytecode" or the 8bit/1byte in which the integer (or character but are we speaking in C language?) is stored but also the glyph which should be displayed for this bytecode ( if we are to speak in C language bitcode ). All UTF-8 is bytecode because it stores the same sort of information as the 7bit ASCII codepage though in 8bit. This is what bytecode means the actual text file would be bytecode while the actual code of C would be represented precisely in bits before makeing UTF-8 bytecode. I have declared the definitions so you may no longer swap meanings.
I need the bytecode to glyph map of UTF-8 as it is used by my runtime software.


from A_bughunter at proton.me

Sent with Proton Mail secure email.

------- Forwarded Message -------
From: Jim DeLaHunt <list+unicode at jdlh.com>
Date: On Sunday, November 3rd, 2024 at 22:42
Subject: Re: get the sourcecode [of UTF-8]
To: A bughunter <A_bughunter at proton.me>
CC: unicode at corp.unicode.org <unicode at corp.unicode.org>


> Hello, anonymous person:
> 
> On 2024-11-02 17:42, A bughunter via Unicode wrote:
> 
> > Where to get the sourcecode of relevent (version) UTF-8?: in order to
> > checksum text against the specific encoding map (codepage).
> > 
> > from A_bughunter at proton.me
> 
> 
> I'm afraid I don't really understand what you are asking here.
> 
> UTF-8 is a data format, a way of representing 21-bit Unicode scalar
> integers in 1, 2, 3, or 4 bytes (octets). It is defined in section
> 2.5.3, "UTF-8"
> https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-2/#G11165,
> 
> of the Core Specification of the Unicode Standard. It has not changed
> over time, so it doesn't really have versions.
> 
> If by "source code" you refer to an implementation of the UTF-8 format,
> then is no single answer. There are multiple implementations of UTF-8,
> and so multiple independent bodies of "source code".
> 
> And there are many things which could be called a "specific encoding map
> (codepage)". I don't know which of those you are referring to.
> 
> Does that answer your question?
> 
> --
> . --Jim DeLaHunt, jdlh at jdlh.com http://blog.jdlh.com/
> (http://jdlh.com/)
> multilingual websites consultant, Vancouver, B.C., Canada
-----BEGIN PGP SIGNATURE-----
Version: ProtonMail

wnUEARYKACcFgmcoXv4JkKkWZTlQrvKZFiEEZlQIBcAycZ2lO9z2qRZlOVCu
8pkAAKqgAP9ihEKSErUj2C84cQmoiaOLBARhlb7SCox8cmuxsncu/gEA6BKd
RJURnHblQXsCwmdootUwwujhY3YXxnh+kEMhPgc=
=R8rX
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: publickey - A_bughunter at proton.me - 0x66540805.asc
Type: application/pgp-keys
Size: 653 bytes
Desc: not available
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20241104/b56c9a59/attachment.key>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: publickey - A_bughunter at proton.me - 0x66540805.asc.sig
Type: application/pgp-signature
Size: 119 bytes
Desc: not available
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20241104/b56c9a59/attachment.sig>


More information about the Unicode mailing list