[SG16] Draft proposal: Clarify guidance for use of a BOM as a UTF-8 encoding signature

Andrew West andrewcwest at gmail.com
Wed Oct 14 03:46:06 CDT 2020


On Wed, 14 Oct 2020 at 06:22, Tom Honermann via Unicode
<unicode at unicode.org> wrote:
>
> Use of a BOM would be one way to get to that desired end state but, as you mentioned, a BOM isn't a great way to identify UTF-8 data.

It is just as good a way to identify UTF-8 data as a BOM in UTF-18
data is for identifying UTF-16BE and UTF-16LE data.

> The Unicode standard already admits this with the quoted "not recommended" text,

I'm sorry, where is "the quoted "not recommended" text" in the Unicode
Standard? The Unicode Standard section 2.6
(https://www.unicode.org/versions/Unicode13.0.0/ch02.pdf#G9354)
states:

"Use of a BOM is neither required nor recommended for UTF-8"

My understanding of this poorly-phrased statement is that the Unicode
Standard does not have a recommendation to use a BOM in UTF-8 text,
but neither does it recommend not to use a BOM in UTF-8 text, i.e. the
standard is essentially neutral on the position of BOM in UTF-8 (I
think the interpretation of this statement has been discussed at least
once previously on the Unicode list).

> but it lacks the rationale to defend that recommendation or to explain when it may be appropriate to disregard that recommendation.

The Unicode Standard text is explicitly not a recommendation!

Andrew


More information about the Unicode mailing list