<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <div class="moz-cite-prefix">On 10/12/20 8:09 PM, J Decker via
      Unicode wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAA2GJqVTwRWne=60k2WUuQi_nUi_-2qE849O-ysc6yoBu=pQUQ@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div dir="ltr"><br>
        </div>
        <br>
        <div class="gmail_quote">
          <div dir="ltr" class="gmail_attr">On Sun, Oct 11, 2020 at 8:24
            PM Tom Honermann via Unicode <<a
              href="mailto:unicode@unicode.org" moz-do-not-send="true">unicode@unicode.org</a>>
            wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div>
              <div>On 10/10/20 7:58 PM, Alisdair Meredith via SG16
                wrote:<br>
              </div>
              <blockquote type="cite"> One concern I have, that might
                lead into rationale for the current discouragement,
                <div>is that I would hate to see a best practice that
                  pushes a BOM into ASCII files.</div>
                <div>One of the nice properties of UTF-8 is that a valid
                  ASCII file (still very common) is</div>
                <div>also a valid UTF-8 file.  Changing best practice
                  would encourage updating those</div>
                <div>files to be no longer ASCII.</div>
              </blockquote>
              <p>Thanks, Alisdair.  I think that concern is implicitly
                addressed by the suggested resolutions, but perhaps that
                can be made more clear.  One possibility would be to
                modify the "protocol designer" guidelines to address the
                case where a protocol's default encoding is ASCII based
                and to specify that a BOM is only required for UTF-8
                text that contains non-ASCII characters.  Would that be
                helpful?<br>
              </p>
            </div>
          </blockquote>
          <div><br>
          </div>
          <div>'and to specify that a BOM is only required for UTF-8 ' 
            this should NEVER be 'required' or 'must', it shouldn't even
            be 'suggested'; fortunately BOM is just a ZWNBSP, so it's
            certainly a 'may' start with a such and such.</div>
          <div>These days the standard 'everything IS utf-8' works
            really well, except in firefox where the charset is required
            to be specified for JS scripts (but that's a bug in that)</div>
          <div>EBCDIC should be converted on the edge to internal ascii,
            since, thankfully, this is a niche application and
            everything thinks in ASCII or some derivative thereof.</div>
          <div>Byte Order Mark is irrelatvent to utf-8 since bytes are
            ordered in the correct order.</div>
          <div>I have run into several editors that have insisted on
            emitted BOM for UTF8 when initially promoted from ASCII, but
            subsequently deleting it doesn't bother anything.</div>
        </div>
      </div>
    </blockquote>
    I mostly agree.  Please note that the paper suggests use of a BOM
    only as a last resort.  The goal is to further discourage its use
    with rationale.<br>
    <blockquote type="cite"
cite="mid:CAA2GJqVTwRWne=60k2WUuQi_nUi_-2qE849O-ysc6yoBu=pQUQ@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
          </div>
          <div>I am curious though, what was the actual problem you ran
            into that makes you even consider this modification?  <br>
          </div>
        </div>
      </div>
    </blockquote>
    <p>I'm working on improving support for portable C++ source code. 
      Today, there is no character encoding that is supported by all C++
      implementations (not even ASCII).  I'd like to make UTF-8 that
      commonly supported character encoding.  For backward compatibility
      reasons, compilers cannot change their default source code
      character encoding to UTF-8.</p>
    <p>Most C++ applications are created from components that have
      different release schedules and that are maintained by different
      organizations.  Synchronizing a conversion to UTF-8 across
      dependent projects isn't feasible, nor is converting all of the
      source files used by an application to UTF-8 as simple as just
      running them through 'iconv'.  Migration to UTF-8 will therefore
      require an incremental approach for at least some applications,
      though many are likely to find success by simply invoking their
      compiler with the appropriate -everything-is-utf8 option since
      most source files are ASCII.</p>
    <p>Microsoft Visual C++ recognizes a UTF-8 BOM as an encoding
      signature and allows differently encoded source files to be used
      in the same translation unit.  Support for differently encoded
      source files in the same translation unit is the feature that will
      be needed to enable incremental migration.  Normative
      discouragement (with rationale) for use of a BOM by the Unicode
      standard would be helpful to explain why a solution other than a
      BOM (perhaps something like <a
href="https://docs.python.org/3/reference/lexical_analysis.html#encoding-declarations">Python's
        encoding declaration</a>) should be standardized in favor of the
      existing practice demonstrated by Microsoft's solution.</p>
    <p>Tom.<br>
    </p>
    <blockquote type="cite"
cite="mid:CAA2GJqVTwRWne=60k2WUuQi_nUi_-2qE849O-ysc6yoBu=pQUQ@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <div><br>
          </div>
          <div>J</div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div>
              <p> </p>
              <p>Tom.<br>
              </p>
              <blockquote type="cite">
                <div><br>
                </div>
                <div>AlisdairM<br>
                  <div><br>
                    <blockquote type="cite">
                      <div>On Oct 10, 2020, at 14:54, Tom Honermann via
                        SG16 <<a href="mailto:sg16@lists.isocpp.org"
                          target="_blank" moz-do-not-send="true">sg16@lists.isocpp.org</a>>
                        wrote:</div>
                      <br>
                      <div>
                        <div>
                          <p>Attached is a draft proposal for the
                            Unicode standard that intends to clarify the
                            current recommendation regarding use of a
                            BOM in UTF-8 text.  This is follow up to <a
href="https://corp.unicode.org/pipermail/unicode/2020-June/008713.html"
                              target="_blank" moz-do-not-send="true">discussion
                              on the Unicode mailing list</a> back in
                            June.</p>
                          <p>Feedback is welcome.  I plan to <a
                              href="https://www.unicode.org/pending/docsubmit.html"
                              target="_blank" moz-do-not-send="true">submit</a>
                            this to the UTC in a week or so pending
                            review feedback.<br>
                          </p>
                          <p>Tom.<br>
                          </p>
                        </div>
                        <span
id="gmail-m_-2846571300384305609cid:958C9297-66AC-4D88-8F0B-577B8BA2589E@nyc.rr.com"><Unicode-BOM-guidance.pdf></span>--
                        <br>
                        SG16 mailing list<br>
                        <a href="mailto:SG16@lists.isocpp.org"
                          target="_blank" moz-do-not-send="true">SG16@lists.isocpp.org</a><br>
                        <a
                          href="https://lists.isocpp.org/mailman/listinfo.cgi/sg16"
                          target="_blank" moz-do-not-send="true">https://lists.isocpp.org/mailman/listinfo.cgi/sg16</a><br>
                      </div>
                    </blockquote>
                  </div>
                  <br>
                </div>
                <br>
                <fieldset></fieldset>
              </blockquote>
              <p><br>
              </p>
            </div>
          </blockquote>
        </div>
      </div>
    </blockquote>
    <p><br>
    </p>
  </body>
</html>