[SG16] Draft proposal: Clarify guidance for use of a BOM as a UTF-8 encoding signature

James Kass jameskasskrv at gmail.com
Mon Oct 12 19:39:33 CDT 2020


On double-checking it turns out that these aren't (upper-) ASCII strings 
after all.  They're just ANSI strings.  Please see attached graphic.  My 
bad, sorry for the confusion (and the two typos) in my earlier post.

I was trying to point out the similarity between the hex byte strings 
used in UTF-8 and hex byte strings used in HTML NCRs to point to a 
character's USV.  That similarity exists no matter how poorly my 
assertion was phrased.

Perhaps it would have been better to point out the similarity between 
surrogate pairs and UTF-8.  Think of UTF-8 as being surrogate pairs (or 
trios or quadrupeds or whatever) which point to a Unicode character.

A system substitutes a Unicode value for the UTF-8 hex byte string 
before further processing can occur.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 20201012_Capture.jpg
Type: image/jpeg
Size: 98051 bytes
Desc: not available
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20201013/7a8a5341/attachment-0001.jpg>


More information about the Unicode mailing list