What is the Unicode guidance regarding the use of a BOM as a UTF-8 encoding signature?

Eli Zaretskii eliz at gnu.org
Sat Jun 6 01:39:44 CDT 2020


> CC: Alisdair Meredith <alisdairm at me.com>,
>         Unicode Mail List
>  <unicode at unicode.org>
> Date: Fri, 5 Jun 2020 22:33:23 +0000
> From: Shawn Steele via Unicode <unicode at unicode.org>
> 
> I’ve been recommending that people assume documents are UTF-8.  If the UTF-8 decoding fails, then
> consider falling back to some other codepage.

That strategy would fail with 7-bit ISO 2022 based encodings, no?
They look like plain 7-bit ASCII (which will not fail UTF-8), but
actually represent non-ASCII text.


More information about the Unicode mailing list