get the sourcecode [of UTF-8]

Alexis flexibeast at gmail.com
Tue Nov 5 07:23:14 CST 2024


A bughunter via Unicode <unicode at corp.unicode.org> writes:

> Generally we put
> the standard into a computer language such as C. Therefore the 
> Unicode
> V.16 standard of UTF-8 should also be the sourcecode of the
> implimentation these converge making them synonymous at the
> convergence.

UTF-8 is an _encoding_ of Unicode, a specification of how to 
represent Unicode at the bit level. An _encoding_ is something 
different from _source code_. Source code is programming language 
text that gets translated - interpreted or compiled - into machine 
language. UTF-8 is not a programming language. It's a way of 
saying "This Unicode code point is encoded in UTF-8 with the 
following bit pattern." If you'd like an introduction to how 
Unicode code points - like code point 65 for 'A' - are encoded by 
UTF-8, you might find this section of the relevant Wikipedia page 
helpful:

  https://en.wikipedia.org/wiki/UTF-8#Description

There is no piece of software that's the 'reference 
implementation' of UTF-8, because UTF-8 is not a specification for 
e.g. a software library providing certain functionality: again, 
UTF-8 is an algorithm for representing Unicode code points at the 
bit level. Programming languages provide functionality for 
converting to and from UTF-8.

It's _Unicode_ that has versions; UTF-8 basically does not.


Alexis.


More information about the Unicode mailing list