get the sourcecode [of UTF-8]

Jim DeLaHunt list+unicode at jdlh.com
Thu Nov 7 15:43:30 CST 2024


On 2024-11-07 06:32, A bughunter via Unicode wrote:
> … Not sure why you opened like the narrator and assessor of Antiques 
> Roadshow.
>> What I see in the repo are various representations of historical 
>> documents from the 18th century, which were originally produced as 
>> English language text hand-written on parchment with pen and ink. You 
>> have images of the text on physical pages, and the character content 
>> of the texts in UTF-8.…
>
> Do you like the full continuity definition of Libre-sourcecode? I 
> enjoy the plug but the moderator might not. None of this was necesarry 
> to answer the question on this mailing list thread.…
> Again showing you a photograph of The Unanimous Declaration of the 
> thirteen united States of America shouldn't make anything click in 
> your head to be able to answer my originating question any better than 
> without having seen the photograph.…

I summarised what I understand of your project as a courtesy to my 
fellow unicode-list subscribers. I see in the replies that some of them 
are, like me, baffled at your attempts to convey what it is you want. I 
hope to ease their minds by offering a comprehensible interpretation of 
what might be your meaning.


> [does] Unicode have any reference model? When saying both I mean the 
> standard's source and the machine's source.…
>
> …It is disappointing of Unicode consortium that it does not know how 
> android or Linux impliment's it's standard.… 

Well, I hate to confirm your disappointment, but it is not the business 
of the Unicode consortium to track every usage of its standards, and 
archive source code of all software which implements its standards.  I 
don't understand why you would be so confident that this is its duty.

If you want to know how Android code behaves, ask Android and the 
Android source code. If you want to understand how Linux behaves, ask 
Linux and the Linux source code. If you want to ask how bughunter uses 
the English language, ask bughunter and not the Oxford English Dictionary.

Yes, Unicode has a reference model. It takes the form of human-readable 
text, and lots of data tables, linked to from 
<https://www.unicode.org/versions/Unicode16.0.0/#Components>. It has 
over 30 years of past versions linked to from 
<https://unicode.org/versions/enumeratedversions.html>.

This reference model has been fantastically successful. Every major 
current operating system, most major web sites, and most major 
applications have adopted it.  All this without publishing the sort of 
reference software implementation which you seem to feel is necessary. 
Even though this disappoints you, it is the reality.


> Ye have some misconception about what text is. Text is text.…
> I don't get why you think Unicode stops short of glyphs when while 
> this discussion goes on a guy will send a glyph to the mailing list 
> calling it an english character and ask it to be added.…
> You guys have these fragmented thoughts and misconceptions. There is a 
> full continuity where all of these be synonymous. data which is 
> unicode-data which is bytecode which is an C integer which is an 
> character may be a letter which is a glyph which is text. …This is to 
> say when Mr. Reader is reading the screen he sees TEXT. That text is 
> consituted and comprised of all of these synonyms. Which are synonyms 
> where they converge (instance) to comprise TEXT on Mr. Reader's screen.…

Hahaha. "Text is text." "all of these be synonymous". Lol. The entire 
thrust of the Unicode community's work for over three decades is that 
text is not "just text"; that is, implementing human cultural traditions 
of written text in software is not simple and not easy. There is a huge 
difference between what Unicode calls "plain text" and "markup" and 
"presentation". There is a huge difference between a "code unit" and a 
"keypress" and a "character" and a "glyph" and a "glyph image".

When you say that it is the rest of us on this list, and not you, who 
have "these fragmented thoughts and misconceptions", you are 
establishing yourself as someone who does not understand this complexity 
and does not care to.  That will reduce the amount of help you get from 
the genuine experts, who are genuinely helpful, on this list.

I encourage you to take your passion and energy, and apply it to 
learning about this fascinating area where human culture and technology 
overlap. Watch some of the tutorial videos from the Unicode Consortium 
at <https://www.youtube.com/@unicode>. Sign up for the Unicode Events 
newsletter <https://www.unicode.org/events/recent-events.html>, and 
attend some of the tutorials and Unicode Technical Workshops.

Good luck with your project! (In the sense of, you are going to need 
it.) Best regards,

  —Jim DeLaHunt

-- 
.   --Jim DeLaHunt, jdlh at jdlh.com     http://blog.jdlh.com/ 
(http://jdlh.com/)
       multilingual websites consultant, Vancouver, B.C., Canada



More information about the Unicode mailing list