<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <div class="moz-cite-prefix">On 10/11/2020 9:28 PM, James Kass via

      Unicode wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:40ba9ef0-6618-24f0-f3e6-fb06c0dbf753@code2001.com">

      <br>

      <br>

      On 2020-10-12 3:37 AM, Tom Honermann via Unicode wrote:

      <br>

      <blockquote type="cite">On 10/11/20 11:32 PM, JF Bastien wrote:

        <br>

        <blockquote type="cite">It’s a bit odd: if you assume the

          default is ascii then you don’t need this. If you assume the

          default is utf8 then you don’t need this... so when do you

          need the BOM? It seems like making bad prior choices more

          acceptable... even though they were bad choices. I’m not sure

          it’s a good idea.

          <br>

        </blockquote>

        <br>

        A BOM would be needed when:

        <br>

        <br>

        1. The default encoding is ASCII based (ISO-8859-1,

        Windows-1252,

        <br>

           etc...) and the UTF-8 text to be produced contains non-ASCII

        <br>

           characters.  Or,

        <br>

        2. The default encoding is not ASCII based (e.g., EBCDIC).

        <br>

        <br>

        Both of these cases presume that the default encoding can't be

        made UTF-8 for backward compatibility reasons.

        <br>

        <br>

        Tom.

        <br>

      </blockquote>

      <br>

      <br>

      1.  UTF-8 text consists only of ASCII characters. Even if some

      ASCII strings reference non-ASCII characters. <br>

    </blockquote>

    <p>Come again?</p>

    <p>UTF-8 encoding of ASCII characters consists only of these ASCII

      characters. However, any other character results in sequences of

      only non-ASCII code units - bytes with the high bit set.</p>

    <p>Thus satisfied that your premise is incorrect, do any of your

      following conclusions still apply?</p>

    <p>A./<br>

    </p>

    <blockquote type="cite"

      cite="mid:40ba9ef0-6618-24f0-f3e6-fb06c0dbf753@code2001.com"> 

      It's the same idea as HTML numeric character references which

      point to non-ASCII characters while being composed of ASCII

      characters.  It shouldn't matter whether a string of ASCII digits

      form the charcter number or a string of UTF-8 hex bytes form that

      number.  A Unicode-aware application will display the string as a

      special character while legacy applications will show the string

      as mojibake.  Either way, UTF-8 remains an ASCII-preserving

      encoding format.

      <br>

      <br>

      2.  Files using non-standard encodings should be converted to

      Unicode.

      <br>

      <br>

      Any plain-text file should be presumed to be UTF-8 unless marked

      otherwise.

      <br>

      <br>

      Years ago, the UTF-8 signature was sometimes considered helpful.

      Nowadays it seems be more of an anachronism.

      <br>

    </blockquote>

    <p><br>

    </p>

  </body>

</html>