What is the Unicode guidance regarding the use of a BOM as a UTF-8 encoding signature?

Markus Scherer markus.icu at gmail.com
Fri Jun 5 22:25:59 CDT 2020


On Fri, Jun 5, 2020 at 5:36 PM Tom Honermann via Unicode <
unicode at unicode.org> wrote:

> On 6/5/20 5:47 PM, Shawn Steele via Unicode wrote:
>
> Are you asking because you’re interested in differentiating UTF-8 from
> UTF-16?  Or UTF-8 from some other legacy non-Unicode encoding?
>
> The latter.  In particular, as a differentiator between shiny new UTF-8
> encoded source code files and long-in-the-tooth legacy encoded source code
> files coexisting (perhaps via transitive package dependencies) within a
> single project.
>
I would not use a BOM/signature on source code files. It will confuse or
break various tools.

I would take any non-ASCII/non-UTF-8 source code file and convert it to
UTF-8, and be done with it.

markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/mailman/private/unicode/attachments/20200605/dfcdd7d3/attachment-0001.htm>


More information about the Unicode mailing list