What is the Unicode guidance regarding the use of a BOM as a UTF-8 encoding signature?

Tom Honermann tom at honermann.net
Fri Jun 5 15:10:19 CDT 2020

Unicode 13 chapter 2.6 (Encoding Schemes), when discussing byte order, 
states (emphasis mine):

> ... *Use of a BOM is neither required nor recommended for UTF-8*, but 
> may be encountered in contexts where UTF-8 data is converted from 
> other encoding forms that  use  a  BOM  or  where  the  BOM  is  used 
> as  a  UTF-8  signature.  See  the  “Byte  Order Mark” subsection in 
> Section 23.8, Specials, for more information.
The emphasized statement is unconditional regarding the recommendation, 
but it isn't clear to me that this recommendation is intended to extend 
to both presence of a BOM in contexts where the encoding is known to be 
UTF-8 (where the BOM provides no additional information) and to contexts 
where the BOM signifies the presence of UTF-8 encoded text (where the 
BOM does provide additional information).  Is the guidance intended to 
state that, when possible, use of UTF-8 as an encoding signature is to 
be avoided in favor of some other mechanism?

The referenced "Byte Order Mark" section in Unicode 13 chapter 23.8 
(Specials) contains no similar guidance; it is factual and details some 
possible consequences of use, but does not apply a judgement.  The 
discussion of use with other character sets could be read as an 
endorsement for use of a BOM as an encoding signature.

Likewise, the "UTF-8, UTF-16, UTF-32 & BOM" section in the Unicode FAQ 
<https://www.unicode.org/faq/utf_bom.html> does not recommend for or 
against use of a BOM as an encoding signature.  It also can be read as 
endorsing such usage.

So, my question is, what exactly is the intent of the emphasized 
statement above?  Is the recommendation intended to be so broadly 
worded?  Or is it only intended to discourage BOM use in cases where the 
encoding is known by other means?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/mailman/private/unicode/attachments/20200605/0dd4b63c/attachment.htm>

More information about the Unicode mailing list