[SG16] Draft proposal: Clarify guidance for use of a BOM as a UTF-8 encoding signature

Tom Honermann tom at honermann.net
Sun Oct 11 22:37:04 CDT 2020


On 10/11/20 11:32 PM, JF Bastien wrote:
> It’s a bit odd: if you assume the default is ascii then you don’t need 
> this. If you assume the default is utf8 then you don’t need this... so 
> when do you need the BOM? It seems like making bad prior choices more 
> acceptable... even though they were bad choices. I’m not sure it’s a 
> good idea.

A BOM would be needed when:

 1. The default encoding is ASCII based (ISO-8859-1, Windows-1252,
    etc...) and the UTF-8 text to be produced contains non-ASCII
    characters.  Or,
 2. The default encoding is not ASCII based (e.g., EBCDIC).

Both of these cases presume that the default encoding can't be made 
UTF-8 for backward compatibility reasons.

Tom.

>
> On Sun, Oct 11, 2020 at 8:22 PM Tom Honermann via SG16 
> <sg16 at lists.isocpp.org <mailto:sg16 at lists.isocpp.org>> wrote:
>
>     On 10/10/20 7:58 PM, Alisdair Meredith via SG16 wrote:
>>     One concern I have, that might lead into rationale for the
>>     current discouragement,
>>     is that I would hate to see a best practice that pushes a BOM
>>     into ASCII files.
>>     One of the nice properties of UTF-8 is that a valid ASCII file
>>     (still very common) is
>>     also a valid UTF-8 file.  Changing best practice would encourage
>>     updating those
>>     files to be no longer ASCII.
>
>     Thanks, Alisdair.  I think that concern is implicitly addressed by
>     the suggested resolutions, but perhaps that can be made more
>     clear.  One possibility would be to modify the "protocol designer"
>     guidelines to address the case where a protocol's default encoding
>     is ASCII based and to specify that a BOM is only required for
>     UTF-8 text that contains non-ASCII characters.  Would that be helpful?
>
>
>     Tom.
>
>>
>>     AlisdairM
>>
>>>     On Oct 10, 2020, at 14:54, Tom Honermann via SG16
>>>     <sg16 at lists.isocpp.org <mailto:sg16 at lists.isocpp.org>> wrote:
>>>
>>>     Attached is a draft proposal for the Unicode standard that
>>>     intends to clarify the current recommendation regarding use of a
>>>     BOM in UTF-8 text.  This is follow up to discussion on the
>>>     Unicode mailing list
>>>     <https://corp.unicode.org/pipermail/unicode/2020-June/008713.html>
>>>     back in June.
>>>
>>>     Feedback is welcome.  I plan to submit
>>>     <https://www.unicode.org/pending/docsubmit.html> this to the UTC
>>>     in a week or so pending review feedback.
>>>
>>>     Tom.
>>>
>>>     <Unicode-BOM-guidance.pdf>--
>>>     SG16 mailing list
>>>     SG16 at lists.isocpp.org <mailto:SG16 at lists.isocpp.org>
>>>     https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>>
>>
>
>     -- 
>     SG16 mailing list
>     SG16 at lists.isocpp.org <mailto:SG16 at lists.isocpp.org>
>     https://lists.isocpp.org/mailman/listinfo.cgi/sg16
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://corp.unicode.org/pipermail/unicode/attachments/20201011/888361ff/attachment.htm>


More information about the Unicode mailing list