[SG16] Draft proposal: Clarify guidance for use of a BOM as a UTF-8 encoding signature

Tom Honermann tom at honermann.net
Wed Oct 14 08:35:22 CDT 2020

On 10/14/20 4:46 AM, Andrew West wrote:
> On Wed, 14 Oct 2020 at 06:22, Tom Honermann via Unicode
> <unicode at unicode.org> wrote:
>> Use of a BOM would be one way to get to that desired end state but, as you mentioned, a BOM isn't a great way to identify UTF-8 data.
> It is just as good a way to identify UTF-8 data as a BOM in UTF-18
> data is for identifying UTF-16BE and UTF-16LE data.
>> The Unicode standard already admits this with the quoted "not recommended" text,
> I'm sorry, where is "the quoted "not recommended" text" in the Unicode
> Standard? The Unicode Standard section 2.6
> (https://www.unicode.org/versions/Unicode13.0.0/ch02.pdf#G9354)
> states:
> "Use of a BOM is neither required nor recommended for UTF-8"
> My understanding of this poorly-phrased statement is that the Unicode
> Standard does not have a recommendation to use a BOM in UTF-8 text,
> but neither does it recommend not to use a BOM in UTF-8 text, i.e. the
> standard is essentially neutral on the position of BOM in UTF-8 (I
> think the interpretation of this statement has been discussed at least
> once previously on the Unicode list).

This has been discussed before; one such discussion is linked from the 
paper (https://corp.unicode.org/pipermail/unicode/2020-June/008713.html).

Your interpretation of that phrase does not match my interpretation nor 
that of anyone else that I've discussed this with.  If the intent had 
been to be neutral, then "Use of a BOM is not required for UTF-8" would 
have sufficed.  If the intent had been to be explicitly neutral, then 
something like "Use of a BOM in UTF-8 is not required and this standard 
makes no recommendations regarding its use or non-use".

>> but it lacks the rationale to defend that recommendation or to explain when it may be appropriate to disregard that recommendation.
> The Unicode Standard text is explicitly not a recommendation!

If so, then the first suggested resolution in the paper would clarify that.


> Andrew

More information about the Unicode mailing list