What is the Unicode guidance regarding the use of a BOM as a UTF-8 encoding signature?

Shawn Steele Shawn.Steele at microsoft.com
Sat Jun 6 01:58:55 CDT 2020


I mentioned that later....  But there is a lot of content for interchange that are single/double byte (8 bit) rather than requiring escape sequences.  The 2022 encodings seem rarer, though it may depend on your data source.

-----Original Message-----
From: Eli Zaretskii <eliz at gnu.org> 
Sent: Friday, June 5, 2020 11:40 PM
To: Shawn Steele <Shawn.Steele at microsoft.com>
Cc: tom at honermann.net; alisdairm at me.com; unicode at unicode.org
Subject: Re: What is the Unicode guidance regarding the use of a BOM as a UTF-8 encoding signature?

> CC: Alisdair Meredith <alisdairm at me.com>,
>         Unicode Mail List
>  <unicode at unicode.org>
> Date: Fri, 5 Jun 2020 22:33:23 +0000
> From: Shawn Steele via Unicode <unicode at unicode.org>
> 
> I’ve been recommending that people assume documents are UTF-8.  If the 
> UTF-8 decoding fails, then consider falling back to some other codepage.

That strategy would fail with 7-bit ISO 2022 based encodings, no?
They look like plain 7-bit ASCII (which will not fail UTF-8), but actually represent non-ASCII text.



More information about the Unicode mailing list