What is the Unicode guidance regarding the use of a BOM as a UTF-8 encoding signature?

Eli Zaretskii eliz at gnu.org
Sat Jun 6 02:12:50 CDT 2020


> CC: "tom at honermann.net" <tom at honermann.net>,
>         "alisdairm at me.com"
>  <alisdairm at me.com>,
>         "unicode at unicode.org" <unicode at unicode.org>
> Date: Sat, 6 Jun 2020 06:58:55 +0000
> From: Shawn Steele via Unicode <unicode at unicode.org>
> 
> I mentioned that later....  But there is a lot of content for interchange that are single/double byte (8 bit) rather than requiring escape sequences.  The 2022 encodings seem rarer, though it may depend on your data source.

I agree that ISO 2022 is rare these days, but rarity doesn't help when
you need to be accurate in decoding, because mistaking one encoding
for another produces horribly incorrect results, and users complain
vociferously when that happens.


More information about the Unicode mailing list