<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">On 10/12/20 8:09 PM, J Decker via
Unicode wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAA2GJqVTwRWne=60k2WUuQi_nUi_-2qE849O-ysc6yoBu=pQUQ@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div dir="ltr"><br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Sun, Oct 11, 2020 at 8:24
PM Tom Honermann via Unicode <<a
href="mailto:unicode@unicode.org" moz-do-not-send="true">unicode@unicode.org</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<div>On 10/10/20 7:58 PM, Alisdair Meredith via SG16
wrote:<br>
</div>
<blockquote type="cite"> One concern I have, that might
lead into rationale for the current discouragement,
<div>is that I would hate to see a best practice that
pushes a BOM into ASCII files.</div>
<div>One of the nice properties of UTF-8 is that a valid
ASCII file (still very common) is</div>
<div>also a valid UTF-8 file. Changing best practice
would encourage updating those</div>
<div>files to be no longer ASCII.</div>
</blockquote>
<p>Thanks, Alisdair. I think that concern is implicitly
addressed by the suggested resolutions, but perhaps that
can be made more clear. One possibility would be to
modify the "protocol designer" guidelines to address the
case where a protocol's default encoding is ASCII based
and to specify that a BOM is only required for UTF-8
text that contains non-ASCII characters. Would that be
helpful?<br>
</p>
</div>
</blockquote>
<div><br>
</div>
<div>'and to specify that a BOM is only required for UTF-8 '
this should NEVER be 'required' or 'must', it shouldn't even
be 'suggested'; fortunately BOM is just a ZWNBSP, so it's
certainly a 'may' start with a such and such.</div>
<div>These days the standard 'everything IS utf-8' works
really well, except in firefox where the charset is required
to be specified for JS scripts (but that's a bug in that)</div>
<div>EBCDIC should be converted on the edge to internal ascii,
since, thankfully, this is a niche application and
everything thinks in ASCII or some derivative thereof.</div>
<div>Byte Order Mark is irrelatvent to utf-8 since bytes are
ordered in the correct order.</div>
<div>I have run into several editors that have insisted on
emitted BOM for UTF8 when initially promoted from ASCII, but
subsequently deleting it doesn't bother anything.</div>
</div>
</div>
</blockquote>
I mostly agree. Please note that the paper suggests use of a BOM
only as a last resort. The goal is to further discourage its use
with rationale.<br>
<blockquote type="cite"
cite="mid:CAA2GJqVTwRWne=60k2WUuQi_nUi_-2qE849O-ysc6yoBu=pQUQ@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<div>I am curious though, what was the actual problem you ran
into that makes you even consider this modification? <br>
</div>
</div>
</div>
</blockquote>
<p>I'm working on improving support for portable C++ source code.
Today, there is no character encoding that is supported by all C++
implementations (not even ASCII). I'd like to make UTF-8 that
commonly supported character encoding. For backward compatibility
reasons, compilers cannot change their default source code
character encoding to UTF-8.</p>
<p>Most C++ applications are created from components that have
different release schedules and that are maintained by different
organizations. Synchronizing a conversion to UTF-8 across
dependent projects isn't feasible, nor is converting all of the
source files used by an application to UTF-8 as simple as just
running them through 'iconv'. Migration to UTF-8 will therefore
require an incremental approach for at least some applications,
though many are likely to find success by simply invoking their
compiler with the appropriate -everything-is-utf8 option since
most source files are ASCII.</p>
<p>Microsoft Visual C++ recognizes a UTF-8 BOM as an encoding
signature and allows differently encoded source files to be used
in the same translation unit. Support for differently encoded
source files in the same translation unit is the feature that will
be needed to enable incremental migration. Normative
discouragement (with rationale) for use of a BOM by the Unicode
standard would be helpful to explain why a solution other than a
BOM (perhaps something like <a
href="https://docs.python.org/3/reference/lexical_analysis.html#encoding-declarations">Python's
encoding declaration</a>) should be standardized in favor of the
existing practice demonstrated by Microsoft's solution.</p>
<p>Tom.<br>
</p>
<blockquote type="cite"
cite="mid:CAA2GJqVTwRWne=60k2WUuQi_nUi_-2qE849O-ysc6yoBu=pQUQ@mail.gmail.com">
<div dir="ltr">
<div class="gmail_quote">
<div><br>
</div>
<div>J</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div>
<p> </p>
<p>Tom.<br>
</p>
<blockquote type="cite">
<div><br>
</div>
<div>AlisdairM<br>
<div><br>
<blockquote type="cite">
<div>On Oct 10, 2020, at 14:54, Tom Honermann via
SG16 <<a href="mailto:sg16@lists.isocpp.org"
target="_blank" moz-do-not-send="true">sg16@lists.isocpp.org</a>>
wrote:</div>
<br>
<div>
<div>
<p>Attached is a draft proposal for the
Unicode standard that intends to clarify the
current recommendation regarding use of a
BOM in UTF-8 text. This is follow up to <a
href="https://corp.unicode.org/pipermail/unicode/2020-June/008713.html"
target="_blank" moz-do-not-send="true">discussion
on the Unicode mailing list</a> back in
June.</p>
<p>Feedback is welcome. I plan to <a
href="https://www.unicode.org/pending/docsubmit.html"
target="_blank" moz-do-not-send="true">submit</a>
this to the UTC in a week or so pending
review feedback.<br>
</p>
<p>Tom.<br>
</p>
</div>
<span
id="gmail-m_-2846571300384305609cid:958C9297-66AC-4D88-8F0B-577B8BA2589E@nyc.rr.com"><Unicode-BOM-guidance.pdf></span>--
<br>
SG16 mailing list<br>
<a href="mailto:SG16@lists.isocpp.org"
target="_blank" moz-do-not-send="true">SG16@lists.isocpp.org</a><br>
<a
href="https://lists.isocpp.org/mailman/listinfo.cgi/sg16"
target="_blank" moz-do-not-send="true">https://lists.isocpp.org/mailman/listinfo.cgi/sg16</a><br>
</div>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset></fieldset>
</blockquote>
<p><br>
</p>
</div>
</blockquote>
</div>
</div>
</blockquote>
<p><br>
</p>
</body>
</html>