What is the Unicode guidance regarding the use of a BOM as a UTF-8 encoding signature?
Shawn Steele
Shawn.Steele at microsoft.com
Sat Jun 6 01:58:55 CDT 2020
I mentioned that later.... But there is a lot of content for interchange that are single/double byte (8 bit) rather than requiring escape sequences. The 2022 encodings seem rarer, though it may depend on your data source.
-----Original Message-----
From: Eli Zaretskii <eliz at gnu.org>
Sent: Friday, June 5, 2020 11:40 PM
To: Shawn Steele <Shawn.Steele at microsoft.com>
Cc: tom at honermann.net; alisdairm at me.com; unicode at unicode.org
Subject: Re: What is the Unicode guidance regarding the use of a BOM as a UTF-8 encoding signature?
> CC: Alisdair Meredith <alisdairm at me.com>,
> Unicode Mail List
> <unicode at unicode.org>
> Date: Fri, 5 Jun 2020 22:33:23 +0000
> From: Shawn Steele via Unicode <unicode at unicode.org>
>
> I’ve been recommending that people assume documents are UTF-8. If the
> UTF-8 decoding fails, then consider falling back to some other codepage.
That strategy would fail with 7-bit ISO 2022 based encodings, no?
They look like plain 7-bit ASCII (which will not fail UTF-8), but actually represent non-ASCII text.
More information about the Unicode
mailing list