What is the Unicode guidance regarding the use of a BOM as a UTF-8 encoding signature?

Martin J. Dürst duerst at it.aoyama.ac.jp
Sat Jun 6 19:48:37 CDT 2020


On 06/06/2020 08:04, Markus Scherer via Unicode wrote:
> The BOM -- or for UTF-8 where "byte order" is meaningless, the Unicode
> signature byte sequence -- was popular when Unicode was gaining ground but
> legacy charsets were still widely used.
> Especially on Windows, which had settled on UTF-16 much earlier, lots of
> tools and editors started writing or expecting UTF-8 signatures.
> Other tools (especially in the Linux/Unix world) were never modified to
> expect or even cope with the signature, so ignored it or choked on it.
> There has never been uniform practice on this.
> For the most part, all new and recent text is now UTF-8, and the signature
> byte sequence has fallen out of favor again even where it had been used.

I'm really glad to hear this, and I very much hope it is true. But I 
know of a case where the BOM on UTF-8 is necessary. It's to get Excel 
recognize a CSV file as UTF-8.

Regards,   Martin.

> Having said that, I think the statement is right: "neither required nor
> recommended for UTF-8"
> 
> We might want to review chapter 23 and the FAQ and see if they should be
> updated.
> 
> Thanks,
> markus
> 



More information about the Unicode mailing list