[SG16] Draft proposal: Clarify guidance for use of a BOM as a UTF-8 encoding signature
d3ck0r at gmail.com
Mon Oct 12 19:09:56 CDT 2020
On Sun, Oct 11, 2020 at 8:24 PM Tom Honermann via Unicode <
unicode at unicode.org> wrote:
> On 10/10/20 7:58 PM, Alisdair Meredith via SG16 wrote:
> One concern I have, that might lead into rationale for the current
> is that I would hate to see a best practice that pushes a BOM into ASCII
> One of the nice properties of UTF-8 is that a valid ASCII file (still very
> common) is
> also a valid UTF-8 file. Changing best practice would encourage updating
> files to be no longer ASCII.
> Thanks, Alisdair. I think that concern is implicitly addressed by the
> suggested resolutions, but perhaps that can be made more clear. One
> possibility would be to modify the "protocol designer" guidelines to
> address the case where a protocol's default encoding is ASCII based and to
> specify that a BOM is only required for UTF-8 text that contains non-ASCII
> characters. Would that be helpful?
'and to specify that a BOM is only required for UTF-8 ' this should NEVER
be 'required' or 'must', it shouldn't even be 'suggested'; fortunately BOM
is just a ZWNBSP, so it's certainly a 'may' start with a such and such.
These days the standard 'everything IS utf-8' works really well, except in
firefox where the charset is required to be specified for JS scripts (but
that's a bug in that)
EBCDIC should be converted on the edge to internal ascii, since,
thankfully, this is a niche application and everything thinks in ASCII or
some derivative thereof.
Byte Order Mark is irrelatvent to utf-8 since bytes are ordered in the
I have run into several editors that have insisted on emitted BOM for UTF8
when initially promoted from ASCII, but subsequently deleting it doesn't
I am curious though, what was the actual problem you ran into that makes
you even consider this modification?
> On Oct 10, 2020, at 14:54, Tom Honermann via SG16 <sg16 at lists.isocpp.org>
> Attached is a draft proposal for the Unicode standard that intends to
> clarify the current recommendation regarding use of a BOM in UTF-8 text.
> This is follow up to discussion on the Unicode mailing list
> <https://corp.unicode.org/pipermail/unicode/2020-June/008713.html> back
> in June.
> Feedback is welcome. I plan to submit
> <https://www.unicode.org/pending/docsubmit.html> this to the UTC in a
> week or so pending review feedback.
> SG16 mailing list
> SG16 at lists.isocpp.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode