What is the Unicode guidance regarding the use of a BOM as a UTF-8 encoding signature?

Eli Zaretskii eliz at gnu.org
Sat Jun 6 07:53:30 CDT 2020


> From: Harriet Riddle <harjitmoe at outlook.com>
> CC: "Shawn.Steele at microsoft.com" <Shawn.Steele at microsoft.com>,
> 	"tom at honermann.net" <tom at honermann.net>, "alisdairm at me.com"
> 	<alisdairm at me.com>, "unicode at unicode.org" <unicode at unicode.org>
> Date: Sat, 6 Jun 2020 12:20:40 +0000
> 
> So it is true that detecting ESC on its own will not identify 7-bit ISO 2022, but the specific sequence ESC $ B
> (ESC 0x24 0x42) has only one ANSI/ISO compliant meaning, which is to switch the G0 set to JIS X 0208. In
> UTF-8, there is no such thing as a G0 set (due to it not being fully ISO 2022 based), so it is meaningless.

If you are saying that "ESC $ B" or similar sequences can be
considered as evidence that the text is not in UTF-8, then I might
concur.  Whether that's the "proof" that should reject UTF-8, I'm not
sure.


More information about the Unicode mailing list