<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:"Yu Gothic";
panose-1:2 11 4 0 0 0 0 0 0 0;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"\@Yu Gothic";
panose-1:2 11 4 0 0 0 0 0 0 0;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
span.EmailStyle19
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:1913008283;
mso-list-template-ids:-1333123198;}
@list l0:level1
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:1.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:"Courier New";
mso-bidi-font-family:"Times New Roman";}
@list l0:level3
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:1.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level4
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:2.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level5
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:2.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level6
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:3.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level7
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:3.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level8
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:4.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l0:level9
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:4.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Wingdings;}
@list l1
{mso-list-id:1966889172;
mso-list-type:hybrid;
mso-list-template-ids:-1878765158 1864168670 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l1:level1
{mso-level-start-at:0;
mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;
mso-fareast-font-family:"Yu Gothic";
mso-bidi-font-family:"Times New Roman";}
@list l1:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l1:level3
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
@list l1:level4
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
@list l1:level5
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l1:level6
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
@list l1:level7
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
@list l1:level8
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l1:level9
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal">I’m having trouble with the attempt to be this prescriptive.<br>
<br>
These make sense: “Use Unicode!”<o:p></o:p></p>
<ul type="disc">
<li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1">
If possible, mandate use of UTF-8 without a BOM; diagnose the presence of a BOM in consumed text as an error, and produce text without a BOM.<o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1">
Alternatively, swallow the BOM if present.<o:p></o:p></li></ul>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">After that the situation is clearly hopeless. Applications should Use Unicode, eg: UTF-8, and clearly there are cases happening where that isn’t happening. Trying to prescribe
that negotiation should therefore happen, or that BOMs should be interpreted or whatever is fairly meaningless at that point. Given that the higher-order guidance of “Use Unicode” has already been ignored, at this point it’s garbage-in, garbage-out. Clearly
the app/whatever is ignoring the “use unicode” guidance for some legacy reason. If they could adapt, it should be to use UTF-8. It *<b>might</b>* be helpful to say something about a BOM likely indicating UTF-8 text in otherwise unspecified data, but prescriptive
stuff is pointless, it’s legacy stuff that behaves in a legacy fashion for a reason and saying they should have done it differently 20 years ago isn’t going to help
<span style="font-family:"Segoe UI Emoji",sans-serif">😊</span> <o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">-Shawn<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Unicode <unicode-bounces@unicode.org> <b>On Behalf Of
</b>Tom Honermann via Unicode<br>
<b>Sent:</b> Monday, October 12, 2020 7:03 AM<br>
<b>To:</b> Alisdair Meredith <alisdairm@me.com><br>
<b>Cc:</b> sg16@lists.isocpp.org; Unicode List <unicode@unicode.org><br>
<b>Subject:</b> Re: [SG16] Draft proposal: Clarify guidance for use of a BOM as a UTF-8 encoding signature<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">Great, here is the change I'm making to address this:<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal">Protocol designers:<o:p></o:p></p>
</div>
<div>
<ul type="disc">
<li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1">
If possible, mandate use of UTF-8 without a BOM; diagnose the presence of a BOM in consumed text as an error, and produce text without a BOM.<o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1">
Otherwise, if possible, mandate use of UTF-8 with or without a BOM; accept and discard a BOM in consumed text, and produce text without a BOM.<o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1">
Otherwise, if possible, use UTF-8 as the default encoding with use of other encodings negotiated using information other than a BOM; accept and discard a BOM in consumed text, and produce text without a BOM.<o:p></o:p></li><li class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1">
Otherwise, require the presence of a BOM to differentiate UTF-8 encoded text in both consumed and produced text<b><span style="color:#009900"> unless the absence of a BOM would result in the text being interpreted as an ASCII-based encoding and the UTF-8 text
contains no non-ASCII characters (the exception is intended to avoid the addition of a BOM to ASCII text thus rendering such text as non-ASCII)</span></b>. This approach should be reserved for scenarios in which UTF-8 cannot be adopted as a default due to
backward compatibility concerns.<o:p></o:p></li></ul>
</div>
</blockquote>
<div>
<p class="MsoNormal">Tom.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">On 10/12/20 8:40 AM, Alisdair Meredith wrote:<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal">That addresses my main concern. Essentially, best practice (for UTF-8) would be no BOM unless the document contains code points that require multiple code units to express.
<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">AlisdairM<o:p></o:p></p>
<div>
<p class="MsoNormal"><br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal">On Oct 11, 2020, at 23:22, Tom Honermann <<a href="mailto:tom@honermann.net">tom@honermann.net</a>> wrote:<o:p></o:p></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<div>
<p class="MsoNormal">On 10/10/20 7:58 PM, Alisdair Meredith via SG16 wrote:<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<p class="MsoNormal">One concern I have, that might lead into rationale for the current discouragement,
<o:p></o:p></p>
<div>
<p class="MsoNormal">is that I would hate to see a best practice that pushes a BOM into ASCII files.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">One of the nice properties of UTF-8 is that a valid ASCII file (still very common) is<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">also a valid UTF-8 file. Changing best practice would encourage updating those<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">files to be no longer ASCII.<o:p></o:p></p>
</div>
</blockquote>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Thanks, Alisdair. I think that concern is implicitly addressed by the suggested resolutions, but perhaps that can be made more clear. One possibility would be to modify the "protocol
designer" guidelines to address the case where a protocol's default encoding is ASCII based and to specify that a BOM is only required for UTF-8 text that contains non-ASCII characters. Would that be helpful?<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Tom.<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">AlisdairM<o:p></o:p></p>
<div>
<p class="MsoNormal"><br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal">On Oct 10, 2020, at 14:54, Tom Honermann via SG16 <<a href="mailto:sg16@lists.isocpp.org">sg16@lists.isocpp.org</a>> wrote:<o:p></o:p></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Attached is a draft proposal for the Unicode standard that intends to clarify the current recommendation regarding use of a BOM in UTF-8 text. This is follow up to
<a href="https://corp.unicode.org/pipermail/unicode/2020-June/008713.html">discussion on the Unicode mailing list</a> back in June.<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Feedback is welcome. I plan to
<a href="https://www.unicode.org/pending/docsubmit.html">submit</a> this to the UTC in a week or so pending review feedback.<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Tom.<o:p></o:p></p>
</div>
<p class="MsoNormal"><Unicode-BOM-guidance.pdf>-- <br>
SG16 mailing list<br>
<a href="mailto:SG16@lists.isocpp.org">SG16@lists.isocpp.org</a><br>
<a href="https://lists.isocpp.org/mailman/listinfo.cgi/sg16">https://lists.isocpp.org/mailman/listinfo.cgi/sg16</a><o:p></o:p></p>
</div>
</blockquote>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<p class="MsoNormal"><br>
<br>
<o:p></o:p></p>
</blockquote>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><o:p> </o:p></p>
</div>
</div>
</blockquote>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</blockquote>
<p><o:p> </o:p></p>
</div>
</body>
</html>