What is the Unicode guidance regarding the use of a BOM as a UTF-8 encoding signature?
Martin J. Dürst
duerst at it.aoyama.ac.jp
Sat Jun 6 19:48:37 CDT 2020
On 06/06/2020 08:04, Markus Scherer via Unicode wrote:
> The BOM -- or for UTF-8 where "byte order" is meaningless, the Unicode
> signature byte sequence -- was popular when Unicode was gaining ground but
> legacy charsets were still widely used.
> Especially on Windows, which had settled on UTF-16 much earlier, lots of
> tools and editors started writing or expecting UTF-8 signatures.
> Other tools (especially in the Linux/Unix world) were never modified to
> expect or even cope with the signature, so ignored it or choked on it.
> There has never been uniform practice on this.
> For the most part, all new and recent text is now UTF-8, and the signature
> byte sequence has fallen out of favor again even where it had been used.
I'm really glad to hear this, and I very much hope it is true. But I
know of a case where the BOM on UTF-8 is necessary. It's to get Excel
recognize a CSV file as UTF-8.
Regards, Martin.
> Having said that, I think the statement is right: "neither required nor
> recommended for UTF-8"
>
> We might want to review chapter 23 and the FAQ and see if they should be
> updated.
>
> Thanks,
> markus
>
More information about the Unicode
mailing list