[SG16] Draft proposal: Clarify guidance for use of a BOM as a UTF-8 encoding signature
tom at honermann.net
Sun Oct 11 22:37:04 CDT 2020
On 10/11/20 11:32 PM, JF Bastien wrote:
> It’s a bit odd: if you assume the default is ascii then you don’t need
> this. If you assume the default is utf8 then you don’t need this... so
> when do you need the BOM? It seems like making bad prior choices more
> acceptable... even though they were bad choices. I’m not sure it’s a
> good idea.
A BOM would be needed when:
1. The default encoding is ASCII based (ISO-8859-1, Windows-1252,
etc...) and the UTF-8 text to be produced contains non-ASCII
2. The default encoding is not ASCII based (e.g., EBCDIC).
Both of these cases presume that the default encoding can't be made
UTF-8 for backward compatibility reasons.
> On Sun, Oct 11, 2020 at 8:22 PM Tom Honermann via SG16
> <sg16 at lists.isocpp.org <mailto:sg16 at lists.isocpp.org>> wrote:
> On 10/10/20 7:58 PM, Alisdair Meredith via SG16 wrote:
>> One concern I have, that might lead into rationale for the
>> current discouragement,
>> is that I would hate to see a best practice that pushes a BOM
>> into ASCII files.
>> One of the nice properties of UTF-8 is that a valid ASCII file
>> (still very common) is
>> also a valid UTF-8 file. Changing best practice would encourage
>> updating those
>> files to be no longer ASCII.
> Thanks, Alisdair. I think that concern is implicitly addressed by
> the suggested resolutions, but perhaps that can be made more
> clear. One possibility would be to modify the "protocol designer"
> guidelines to address the case where a protocol's default encoding
> is ASCII based and to specify that a BOM is only required for
> UTF-8 text that contains non-ASCII characters. Would that be helpful?
>>> On Oct 10, 2020, at 14:54, Tom Honermann via SG16
>>> <sg16 at lists.isocpp.org <mailto:sg16 at lists.isocpp.org>> wrote:
>>> Attached is a draft proposal for the Unicode standard that
>>> intends to clarify the current recommendation regarding use of a
>>> BOM in UTF-8 text. This is follow up to discussion on the
>>> Unicode mailing list
>>> back in June.
>>> Feedback is welcome. I plan to submit
>>> <https://www.unicode.org/pending/docsubmit.html> this to the UTC
>>> in a week or so pending review feedback.
>>> SG16 mailing list
>>> SG16 at lists.isocpp.org <mailto:SG16 at lists.isocpp.org>
> SG16 mailing list
> SG16 at lists.isocpp.org <mailto:SG16 at lists.isocpp.org>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode