What is the Unicode guidance regarding the use of a BOM as a UTF-8 encoding signature?
Tom Honermann
tom at honermann.net
Fri Jun 5 15:10:19 CDT 2020
Unicode 13 chapter 2.6 (Encoding Schemes), when discussing byte order,
states (emphasis mine):
> ... *Use of a BOM is neither required nor recommended for UTF-8*, but
> may be encountered in contexts where UTF-8 data is converted from
> other encoding forms that use a BOM or where the BOM is used
> as a UTF-8 signature. See the “Byte Order Mark” subsection in
> Section 23.8, Specials, for more information.
The emphasized statement is unconditional regarding the recommendation,
but it isn't clear to me that this recommendation is intended to extend
to both presence of a BOM in contexts where the encoding is known to be
UTF-8 (where the BOM provides no additional information) and to contexts
where the BOM signifies the presence of UTF-8 encoded text (where the
BOM does provide additional information). Is the guidance intended to
state that, when possible, use of UTF-8 as an encoding signature is to
be avoided in favor of some other mechanism?
The referenced "Byte Order Mark" section in Unicode 13 chapter 23.8
(Specials) contains no similar guidance; it is factual and details some
possible consequences of use, but does not apply a judgement. The
discussion of use with other character sets could be read as an
endorsement for use of a BOM as an encoding signature.
Likewise, the "UTF-8, UTF-16, UTF-32 & BOM" section in the Unicode FAQ
<https://www.unicode.org/faq/utf_bom.html> does not recommend for or
against use of a BOM as an encoding signature. It also can be read as
endorsing such usage.
So, my question is, what exactly is the intent of the emphasized
statement above? Is the recommendation intended to be so broadly
worded? Or is it only intended to discourage BOM use in cases where the
encoding is known by other means?
Tom.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/mailman/private/unicode/attachments/20200605/0dd4b63c/attachment.htm>
More information about the Unicode
mailing list