[SG16] Draft proposal: Clarify guidance for use of a BOM as a UTF-8 encoding signature

David Starner prosfilaes at gmail.com
Wed Oct 14 17:11:11 CDT 2020


On Wed, Oct 14, 2020, 1:52 AM Andrew West via Unicode <unicode at unicode.org>
wrote:

> It is just as good a way to identify UTF-8 data as a BOM in UTF-18
> data is for identifying UTF-16BE and UTF-16LE data.
>

No, it's not. UTF-16/32 is basically the only encodings to use more than 8
bits to encode all characters. It's expected to use a general purpose
signature reader to identify UTF-16. UTF-8, on the other hand, was designed
and is used in a world of ASCII extensions where it's often expected that
the encoding can be named near the start of the file with no need for
nonASCII characters before the encoding declaration. A UTF-8 BOM breaks
that assumption.

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20201014/ae42715b/attachment.htm>


More information about the Unicode mailing list