What is the Unicode guidance regarding the use of a BOM as a UTF-8 encoding signature?

Markus Scherer markus.icu at gmail.com
Fri Jun 5 18:04:12 CDT 2020


The BOM -- or for UTF-8 where "byte order" is meaningless, the Unicode
signature byte sequence -- was popular when Unicode was gaining ground but
legacy charsets were still widely used.
Especially on Windows, which had settled on UTF-16 much earlier, lots of
tools and editors started writing or expecting UTF-8 signatures.
Other tools (especially in the Linux/Unix world) were never modified to
expect or even cope with the signature, so ignored it or choked on it.
There has never been uniform practice on this.
For the most part, all new and recent text is now UTF-8, and the signature
byte sequence has fallen out of favor again even where it had been used.

Having said that, I think the statement is right: "neither required nor
recommended for UTF-8"

We might want to review chapter 23 and the FAQ and see if they should be
updated.

Thanks,
markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/mailman/private/unicode/attachments/20200605/da4e3a7d/attachment.htm>


More information about the Unicode mailing list