get the sourcecode [of UTF-8]

A bughunter A_bughunter at proton.me
Tue Nov 5 03:28:25 CST 2024


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512


My reply to Slawomer is interspersed below. 


from A_bughunter at proton.me

Sent with Proton Mail secure email.

On Monday, November 4th, 2024 at 21:05, Sławomir Osipiuk via Unicode <unicode at corp.unicode.org> wrote:

> 
> On Monday, 04 November 2024, 00:43:29 (-05:00), A bughunter via Unicode wrote:
> 

Originating Question

Where to get the sourcecode of relevent (version) UTF-8?: in order to checksum text against the specific encoding map (codepage).

Such as this now keep the originating question pinned at the top of each reply and let every reply focus on the originating question because as you see I was dumped on with over 5 pages of unrelated and offtopic nonsense in reply to my single line question. 
> > No, it does not answer my question.
> 

I didn't post to hold a free seminar on computer science. By my grace I will expound: UTF-8 is a text format of Unicode. Unicode is a standard. In order to get anything to produce Unicode UTF-8 it must be compiled. Time is a sequence of events you have compile time and runtime. Before something is compiled it is sourcecode. Wherever the UTF-8 is input into the sourcecode it is then compiled into a runtime. As far as your gripe about my strange use of "bytecode" I have already defined it absolutely so. You may go back and re-read. 
> I don't think I'm alone in saying that your question is very unclear, in major part by your very strange use of certain terms. I don't think I've ever encountered "bytecode" outside of Java implementations, and never does it refer to textual (prose) data as you seem to do. I still don't know what "compile time UTF-8" is supposed to be, and I've read both your messages multiple times.
> 

Your question is offtopic the only part you need to focus on to answer the originating question is: "the character to glyph map must be known."
> > In order to fully authenticate: the codepage of the character to glyph map must be known.
> 
> 
> To authenticate what? At the end of the day, you're always just authenticating a stream of bits.
You are wrong about the end of the day " At the end of the day, you're always just authenticating a stream of bits." but I will not argue or correct you because it does not answer my question nor is it specific to Unicode. 

> > I need the bytecode to glyph map of UTF-8 as it is used by my runtime software.

No, I do not want to map. I need the bytecode/character to glyph map in the sourcecode of whatever is being used to produce UTF-8. Absolutely this must be contained in the runtime software in order for anything to produce UTF-8. Yet you have failed to ask for the information required to answer my concise yet full though simple one line relevent ontopic question.
> You want to map UTF8-encoded code points to characters? (Glyphs are the visual representations of characters, determined by the font.) In that case the "map" is the Unicode database. Each code point (encoded as one or more bytes in UTF8) maps to a character. Versions of the database are freely accessible online.

Again the question has been pinned at the top. "Where to get the sourcecode of relevent (version) UTF-8?"
> But I am still very unsure of what you're asking for. Are you concerned that code points may be reassigned in the future? That, for example, writing "Smith" in version 16 may appear as "Smite" in a future version, and this affects the apparent content of a checksummed text file? If so, that is prevented by the Unicode Stability Policy; assigned code points cannot have their character identity changed. I don't see any practical way of exploiting differences between Unicode versions to alter the apparent content of text.
> 

The rest of Slawomir's reply was so far removed from my question "Where to get the sourcecode of relevent (version) UTF-8?" it is not worth replying. 
> Sławomir
-----BEGIN PGP SIGNATURE-----
Version: ProtonMail

wnUEARYKACcFgmcp5TUJkKkWZTlQrvKZFiEEZlQIBcAycZ2lO9z2qRZlOVCu
8pkAAHMlAQCPiej7TKEDdxDH6FanPS58lxX0oG6fcKfGTc3MrBFG2wEAkr6T
I0LKU7dw8XkzdAajfeWLZfhcaQtQD36uFInTKgA=
=B5NX
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: publickey - A_bughunter at proton.me - 0x66540805.asc
Type: application/pgp-keys
Size: 653 bytes
Desc: not available
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20241105/6f2e553e/attachment.key>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: publickey - A_bughunter at proton.me - 0x66540805.asc.sig
Type: application/pgp-signature
Size: 119 bytes
Desc: not available
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20241105/6f2e553e/attachment.sig>


More information about the Unicode mailing list