[SG16] Draft proposal: Clarify guidance for use of a BOM as a UTF-8 encoding signature
jameskasskrv at gmail.com
Mon Oct 12 19:39:33 CDT 2020
On double-checking it turns out that these aren't (upper-) ASCII strings
after all. They're just ANSI strings. Please see attached graphic. My
bad, sorry for the confusion (and the two typos) in my earlier post.
I was trying to point out the similarity between the hex byte strings
used in UTF-8 and hex byte strings used in HTML NCRs to point to a
character's USV. That similarity exists no matter how poorly my
assertion was phrased.
Perhaps it would have been better to point out the similarity between
surrogate pairs and UTF-8. Think of UTF-8 as being surrogate pairs (or
trios or quadrupeds or whatever) which point to a Unicode character.
A system substitutes a Unicode value for the UTF-8 hex byte string
before further processing can occur.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 98051 bytes
Desc: not available
More information about the Unicode